Skip to content

Add quantize fused convbn bias pass#17348

Merged
JakeStevens merged 1 commit into
pytorch:mainfrom
JakeStevens:export-D92733079
Feb 26, 2026
Merged

Add quantize fused convbn bias pass#17348
JakeStevens merged 1 commit into
pytorch:mainfrom
JakeStevens:export-D92733079

Conversation

@JakeStevens

@JakeStevens JakeStevens commented Feb 10, 2026

Copy link
Copy Markdown
Contributor

Summary:
When performing QAT with a model that has a conv layer with no bias followed by batch norm, the fusion process creates a bias. This is done after observers are attached so the resulting bias is kept as float.

This diff adds a pass which grabs the proper qparams and applies them to the non-quantized bias.

Differential Revision: D92733079

cc @robert-kalmar @digantdesai

@pytorch-bot

pytorch-bot Bot commented Feb 10, 2026

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17348

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 2 Pending, 1 Unrelated Failure

As of commit 5855b25 with merge base 2ffe356 (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 10, 2026
@meta-codesync

meta-codesync Bot commented Feb 10, 2026

Copy link
Copy Markdown
Contributor

@JakeStevens has exported this pull request. If you are a Meta employee, you can view the originating Diff in D92733079.

@github-actions

Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@JakeStevens JakeStevens added the module: nxp Issues related to NXP Neutron NPU delegation and code under backends/nxp/ label Feb 10, 2026
JakeStevens added a commit to JakeStevens/executorch that referenced this pull request Feb 10, 2026
Summary:

When performing QAT with a model that has a conv layer with no bias followed by batch norm, the fusion process creates a bias. This is done *after* observers are attached so the resulting bias is kept as float.

This diff adds a pass which grabs the proper qparams and applies them to the non-quantized bias.

Differential Revision: D92733079
JakeStevens added a commit to JakeStevens/executorch that referenced this pull request Feb 10, 2026
Summary:

When performing QAT with a model that has a conv layer with no bias followed by batch norm, the fusion process creates a bias. This is done *after* observers are attached so the resulting bias is kept as float.

This diff adds a pass which grabs the proper qparams and applies them to the non-quantized bias.

Differential Revision: D92733079
@robert-kalmar

Copy link
Copy Markdown
Collaborator

CC @StrycekSimon @roman-janik-nxp

@StrycekSimon StrycekSimon left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried running it with our conversion pipeline but not successfully. Seems like the bias is being added as another input of the model. Can you take a look at it? Or is there some postprocessing step needed I am missing?

Comment thread backends/transforms/quantize_fused_convbn_bias_pass.py Outdated
Comment thread backends/transforms/test/test_quantize_fused_convbn_bias_pass.py Outdated
JakeStevens added a commit to JakeStevens/executorch that referenced this pull request Feb 20, 2026
Summary:

When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. These passes find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias. 
                                                                                                                                                                
                                                                                                                                                                        
Two pass variants are provided:                                                                                                                                       
  - QuantizeFusedConvBnBiasPass (ExportPass) — operates on edge dialect graphs after to_edge()
  - QuantizeFusedConvBnBiasAtenPass (PassBase) — operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes)

Differential Revision: D92733079
@JakeStevens

Copy link
Copy Markdown
Contributor Author

@StrycekSimon the NXP changes and test are now here:

#17599

This diff is now "standalone" pass and the integration with your backend in the above

JakeStevens added a commit to JakeStevens/executorch that referenced this pull request Feb 23, 2026
Summary:

When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. These passes find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias. 
                                                                                                                                                                
                                                                                                                                                                        
Two pass variants are provided:                                                                                                                                       
  - QuantizeFusedConvBnBiasPass (ExportPass) — operates on edge dialect graphs after to_edge()
  - QuantizeFusedConvBnBiasAtenPass (PassBase) — operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes)

Differential Revision: D92733079
JakeStevens pushed a commit to JakeStevens/executorch that referenced this pull request Feb 23, 2026
Summary:
Pull Request resolved: pytorch#17348

When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. These passes find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias.

Two pass variants are provided:
  - QuantizeFusedConvBnBiasPass (ExportPass) — operates on edge dialect graphs after to_edge()
  - QuantizeFusedConvBnBiasAtenPass (PassBase) — operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes)

Differential Revision: D92733079
Comment thread backends/transforms/quantize_fused_convbn_bias_pass.py Outdated
JakeStevens added a commit to JakeStevens/executorch that referenced this pull request Feb 24, 2026
Summary:

When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias.  It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes)

Differential Revision: D92733079
JakeStevens pushed a commit to JakeStevens/executorch that referenced this pull request Feb 24, 2026
Summary:
Pull Request resolved: pytorch#17348

When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias.  It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes)

Differential Revision: D92733079
JakeStevens added a commit to JakeStevens/executorch that referenced this pull request Feb 24, 2026
Summary:

When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias.  It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes)

Differential Revision: D92733079
JakeStevens added a commit to JakeStevens/executorch that referenced this pull request Feb 24, 2026
Summary:

When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias.  It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes)

Differential Revision: D92733079
JakeStevens added a commit to JakeStevens/executorch that referenced this pull request Feb 24, 2026
Summary:

When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias.  It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes)

Differential Revision: D92733079
JakeStevens added a commit to JakeStevens/executorch that referenced this pull request Feb 24, 2026
Summary:

When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias.  It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes)

Differential Revision: D92733079
JakeStevens added a commit to JakeStevens/executorch that referenced this pull request Feb 25, 2026
Summary:

When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias.  It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes)

Reviewed By: larryliu0820

Differential Revision: D92733079

@larryliu0820 larryliu0820 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review automatically exported from Phabricator review in Meta.

JakeStevens added a commit to JakeStevens/executorch that referenced this pull request Feb 25, 2026
Summary:

When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias.  It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes)

Reviewed By: larryliu0820

Differential Revision: D92733079
JakeStevens added a commit to JakeStevens/executorch that referenced this pull request Feb 25, 2026
Summary:

When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias.  It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes)

Reviewed By: larryliu0820

Differential Revision: D92733079
JakeStevens added a commit to JakeStevens/executorch that referenced this pull request Feb 25, 2026
Summary:

When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias.  It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes)

Reviewed By: larryliu0820

Differential Revision: D92733079
JakeStevens pushed a commit to JakeStevens/executorch that referenced this pull request Feb 25, 2026
Summary:
Pull Request resolved: pytorch#17348

When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias.  It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes)

Differential Revision: D92733079
JakeStevens added a commit to JakeStevens/executorch that referenced this pull request Feb 25, 2026
Summary:
Pull Request resolved: pytorch#17348

When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias.  It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes)

Reviewed By: larryliu0820

Differential Revision: D92733079
JakeStevens added a commit to JakeStevens/executorch that referenced this pull request Feb 25, 2026
Summary:

When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias.  It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes)

Reviewed By: larryliu0820

Differential Revision: D92733079
JakeStevens added a commit to JakeStevens/executorch that referenced this pull request Feb 25, 2026
Summary:
Pull Request resolved: pytorch#17348

When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias.  It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes)

Reviewed By: larryliu0820

Differential Revision: D92733079
JakeStevens added a commit to JakeStevens/executorch that referenced this pull request Feb 25, 2026
Summary:

When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias.  It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes)

Reviewed By: larryliu0820

Differential Revision: D92733079
@JakeStevens JakeStevens force-pushed the export-D92733079 branch 2 times, most recently from c3b60ae to d979975 Compare February 26, 2026 14:33
JakeStevens added a commit to JakeStevens/executorch that referenced this pull request Feb 26, 2026
Summary:

When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias.  It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes)

Reviewed By: larryliu0820

Differential Revision: D92733079
JakeStevens added a commit to JakeStevens/executorch that referenced this pull request Feb 26, 2026
Summary:

When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias.  It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes)

Reviewed By: larryliu0820

Differential Revision: D92733079
Summary:

When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias.  It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes)

Reviewed By: larryliu0820

Differential Revision: D92733079
JakeStevens added a commit to JakeStevens/executorch that referenced this pull request Feb 26, 2026
Summary:

When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias.  It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes)

Reviewed By: larryliu0820

Differential Revision: D92733079
JakeStevens added a commit to JakeStevens/executorch that referenced this pull request Feb 26, 2026
Summary:
Pull Request resolved: pytorch#17348

When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias.  It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes)

Differential Revision: D92733079
@JakeStevens JakeStevens merged commit 570d2e9 into pytorch:main Feb 26, 2026
156 of 160 checks passed
@JakeStevens JakeStevens deleted the export-D92733079 branch February 26, 2026 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported module: nxp Issues related to NXP Neutron NPU delegation and code under backends/nxp/

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants