Add quantize fused convbn bias pass by JakeStevens · Pull Request #17348 · pytorch/executorch

JakeStevens · 2026-02-10T18:57:18Z

Summary:
When performing QAT with a model that has a conv layer with no bias followed by batch norm, the fusion process creates a bias. This is done after observers are attached so the resulting bias is kept as float.

This diff adds a pass which grabs the proper qparams and applies them to the non-quantized bias.

Differential Revision: D92733079

cc @robert-kalmar @digantdesai

pytorch-bot · 2026-02-10T18:57:22Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17348

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 2 Pending, 1 Unrelated Failure

As of commit 5855b25 with merge base 2ffe356 ():

NEW FAILURES - The following jobs have failed:

pull / test-samsung-models-linux / linux-job (gh)
RuntimeError: Command docker exec -t 5ef5ae2c455c47bd8860170183be2128280c58202f1969cee32e678e5fbcb351 /exec failed with exit code 1
pull / test-samsung-quantmodels-linux / linux-job (gh)
RuntimeError: Command docker exec -t 471e32f7613ce410231566d61b6184d152b656df7618963cf59d344dc4e64926 /exec failed with exit code 1

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / test-llama-runner-linux (fp32, xnnpack+custom+qe, linux.2xlarge, executorch-ubuntu-22.04-clang12) / linux-job (gh) (detected as infra flaky with no log or failing log classifier)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-codesync · 2026-02-10T18:57:25Z

@JakeStevens has exported this pull request. If you are a Meta employee, you can view the originating Diff in D92733079.

github-actions · 2026-02-10T18:58:34Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Summary: When performing QAT with a model that has a conv layer with no bias followed by batch norm, the fusion process creates a bias. This is done *after* observers are attached so the resulting bias is kept as float. This diff adds a pass which grabs the proper qparams and applies them to the non-quantized bias. Differential Revision: D92733079

robert-kalmar · 2026-02-11T09:05:59Z

CC @StrycekSimon @roman-janik-nxp

StrycekSimon

I tried running it with our conversion pipeline but not successfully. Seems like the bias is being added as another input of the model. Can you take a look at it? Or is there some postprocessing step needed I am missing?

Summary: When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. These passes find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias. Two pass variants are provided: - QuantizeFusedConvBnBiasPass (ExportPass) — operates on edge dialect graphs after to_edge() - QuantizeFusedConvBnBiasAtenPass (PassBase) — operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes) Differential Revision: D92733079

JakeStevens · 2026-02-20T21:30:37Z

@StrycekSimon the NXP changes and test are now here:

#17599

This diff is now "standalone" pass and the integration with your backend in the above

Summary: When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. These passes find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias. Two pass variants are provided: - QuantizeFusedConvBnBiasPass (ExportPass) — operates on edge dialect graphs after to_edge() - QuantizeFusedConvBnBiasAtenPass (PassBase) — operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes) Differential Revision: D92733079

Summary: Pull Request resolved: pytorch#17348 When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. These passes find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias. Two pass variants are provided: - QuantizeFusedConvBnBiasPass (ExportPass) — operates on edge dialect graphs after to_edge() - QuantizeFusedConvBnBiasAtenPass (PassBase) — operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes) Differential Revision: D92733079

Summary: When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias. It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes) Differential Revision: D92733079

Summary: Pull Request resolved: pytorch#17348 When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias. It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes) Differential Revision: D92733079

Summary: When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias. It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes) Differential Revision: D92733079

Summary: When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias. It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes) Reviewed By: larryliu0820 Differential Revision: D92733079

larryliu0820

Review automatically exported from Phabricator review in Meta.

Summary: When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias. It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes) Reviewed By: larryliu0820 Differential Revision: D92733079

Summary: Pull Request resolved: pytorch#17348 When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias. It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes) Differential Revision: D92733079

Summary: Pull Request resolved: pytorch#17348 When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias. It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes) Reviewed By: larryliu0820 Differential Revision: D92733079

Summary: When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias. It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes) Reviewed By: larryliu0820 Differential Revision: D92733079

Summary: Pull Request resolved: pytorch#17348 When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias. It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes) Reviewed By: larryliu0820 Differential Revision: D92733079

Summary: When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias. It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes) Reviewed By: larryliu0820 Differential Revision: D92733079

Summary: Pull Request resolved: pytorch#17348 When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias. It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes) Differential Revision: D92733079

JakeStevens requested a review from kimishpatel as a code owner February 10, 2026 18:57

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 10, 2026

meta-codesync Bot added fb-exported meta-exported labels Feb 10, 2026

JakeStevens requested review from AdrianLundell and robert-kalmar February 10, 2026 18:57

JakeStevens added the module: nxp Issues related to NXP Neutron NPU delegation and code under backends/nxp/ label Feb 10, 2026

JakeStevens force-pushed the export-D92733079 branch from b935be8 to 91bbac7 Compare February 10, 2026 19:26

JakeStevens force-pushed the export-D92733079 branch from 91bbac7 to 6b825e9 Compare February 10, 2026 19:55

StrycekSimon suggested changes Feb 16, 2026

View reviewed changes

Comment thread backends/transforms/quantize_fused_convbn_bias_pass.py Outdated

Comment thread backends/transforms/test/test_quantize_fused_convbn_bias_pass.py Outdated

StrycekSimon mentioned this pull request Feb 18, 2026

Draft: Integration of QuantizeFusedConvBnBiasPass to NXP conversion pipeline #17523

Closed

StrycekSimon reviewed Feb 23, 2026

View reviewed changes

Comment thread backends/transforms/quantize_fused_convbn_bias_pass.py Outdated

JakeStevens force-pushed the export-D92733079 branch from 6b825e9 to 242eed6 Compare February 24, 2026 02:20

JakeStevens force-pushed the export-D92733079 branch from 242eed6 to 8c9a108 Compare February 24, 2026 12:44

JakeStevens force-pushed the export-D92733079 branch from 8c9a108 to fe4b396 Compare February 25, 2026 15:53

larryliu0820 approved these changes Feb 25, 2026

View reviewed changes

JakeStevens force-pushed the export-D92733079 branch from fe4b396 to 495f530 Compare February 25, 2026 20:08

JakeStevens force-pushed the export-D92733079 branch from 495f530 to 8b278d5 Compare February 25, 2026 20:12

JakeStevens force-pushed the export-D92733079 branch from 8b278d5 to cf1c09d Compare February 25, 2026 21:53

JakeStevens force-pushed the export-D92733079 branch from cf1c09d to fc1704a Compare February 25, 2026 21:56

JakeStevens force-pushed the export-D92733079 branch 2 times, most recently from c3b60ae to d979975 Compare February 26, 2026 14:33

JakeStevens force-pushed the export-D92733079 branch from d979975 to 5855b25 Compare February 26, 2026 19:26

JakeStevens merged commit 570d2e9 into pytorch:main Feb 26, 2026
156 of 160 checks passed

JakeStevens deleted the export-D92733079 branch February 26, 2026 20:49

zingo mentioned this pull request Jun 19, 2026

Arm backend: Add real implementation for TOSA dialect ops. #19936

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add quantize fused convbn bias pass#17348

Add quantize fused convbn bias pass#17348
JakeStevens merged 1 commit into
pytorch:mainfrom
JakeStevens:export-D92733079

JakeStevens commented Feb 10, 2026 •

edited by pytorch-bot Bot

Loading

Uh oh!

pytorch-bot Bot commented Feb 10, 2026 •

edited

Loading

Uh oh!

meta-codesync Bot commented Feb 10, 2026

Uh oh!

github-actions Bot commented Feb 10, 2026

Uh oh!

robert-kalmar commented Feb 11, 2026

Uh oh!

StrycekSimon left a comment

Uh oh!

Uh oh!

Uh oh!

JakeStevens commented Feb 20, 2026

Uh oh!

Uh oh!

larryliu0820 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

JakeStevens commented Feb 10, 2026 • edited by pytorch-bot Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17348

❌ 2 New Failures, 2 Pending, 1 Unrelated Failure

Uh oh!

meta-codesync Bot commented Feb 10, 2026

Uh oh!

github-actions Bot commented Feb 10, 2026

This PR needs a release notes: label

Uh oh!

robert-kalmar commented Feb 11, 2026

Uh oh!

StrycekSimon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

JakeStevens commented Feb 20, 2026

Uh oh!

Uh oh!

larryliu0820 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

JakeStevens commented Feb 10, 2026 •

edited by pytorch-bot Bot

Loading

pytorch-bot Bot commented Feb 10, 2026 •

edited

Loading

This PR needs a `release notes:` label