Add quantize fused convbn bias pass#17348
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17348
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New Failures, 2 Pending, 1 Unrelated FailureAs of commit 5855b25 with merge base 2ffe356 ( NEW FAILURES - The following jobs have failed:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@JakeStevens has exported this pull request. If you are a Meta employee, you can view the originating Diff in D92733079. |
This PR needs a
|
b935be8 to
91bbac7
Compare
Summary: When performing QAT with a model that has a conv layer with no bias followed by batch norm, the fusion process creates a bias. This is done *after* observers are attached so the resulting bias is kept as float. This diff adds a pass which grabs the proper qparams and applies them to the non-quantized bias. Differential Revision: D92733079
Summary: When performing QAT with a model that has a conv layer with no bias followed by batch norm, the fusion process creates a bias. This is done *after* observers are attached so the resulting bias is kept as float. This diff adds a pass which grabs the proper qparams and applies them to the non-quantized bias. Differential Revision: D92733079
91bbac7 to
6b825e9
Compare
StrycekSimon
left a comment
There was a problem hiding this comment.
I tried running it with our conversion pipeline but not successfully. Seems like the bias is being added as another input of the model. Can you take a look at it? Or is there some postprocessing step needed I am missing?
Summary:
When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. These passes find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias.
Two pass variants are provided:
- QuantizeFusedConvBnBiasPass (ExportPass) — operates on edge dialect graphs after to_edge()
- QuantizeFusedConvBnBiasAtenPass (PassBase) — operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes)
Differential Revision: D92733079
|
@StrycekSimon the NXP changes and test are now here: This diff is now "standalone" pass and the integration with your backend in the above |
Summary:
When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. These passes find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias.
Two pass variants are provided:
- QuantizeFusedConvBnBiasPass (ExportPass) — operates on edge dialect graphs after to_edge()
- QuantizeFusedConvBnBiasAtenPass (PassBase) — operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes)
Differential Revision: D92733079
Summary: Pull Request resolved: pytorch#17348 When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. These passes find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias. Two pass variants are provided: - QuantizeFusedConvBnBiasPass (ExportPass) — operates on edge dialect graphs after to_edge() - QuantizeFusedConvBnBiasAtenPass (PassBase) — operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes) Differential Revision: D92733079
Summary: When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias. It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes) Differential Revision: D92733079
Summary: Pull Request resolved: pytorch#17348 When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias. It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes) Differential Revision: D92733079
Summary: When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias. It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes) Differential Revision: D92733079
6b825e9 to
242eed6
Compare
Summary: When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias. It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes) Differential Revision: D92733079
Summary: When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias. It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes) Differential Revision: D92733079
242eed6 to
8c9a108
Compare
Summary: When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias. It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes) Differential Revision: D92733079
8c9a108 to
fe4b396
Compare
Summary: When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias. It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes) Reviewed By: larryliu0820 Differential Revision: D92733079
larryliu0820
left a comment
There was a problem hiding this comment.
Review automatically exported from Phabricator review in Meta.
Summary: When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias. It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes) Reviewed By: larryliu0820 Differential Revision: D92733079
fe4b396 to
495f530
Compare
Summary: When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias. It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes) Reviewed By: larryliu0820 Differential Revision: D92733079
Summary: When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias. It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes) Reviewed By: larryliu0820 Differential Revision: D92733079
Summary: Pull Request resolved: pytorch#17348 When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias. It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes) Differential Revision: D92733079
Summary: Pull Request resolved: pytorch#17348 When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias. It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes) Reviewed By: larryliu0820 Differential Revision: D92733079
495f530 to
8b278d5
Compare
Summary: When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias. It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes) Reviewed By: larryliu0820 Differential Revision: D92733079
8b278d5 to
cf1c09d
Compare
Summary: Pull Request resolved: pytorch#17348 When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias. It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes) Reviewed By: larryliu0820 Differential Revision: D92733079
cf1c09d to
fc1704a
Compare
Summary: When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias. It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes) Reviewed By: larryliu0820 Differential Revision: D92733079
c3b60ae to
d979975
Compare
Summary: When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias. It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes) Reviewed By: larryliu0820 Differential Revision: D92733079
Summary: When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias. It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes) Reviewed By: larryliu0820 Differential Revision: D92733079
Summary: When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias. It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes) Reviewed By: larryliu0820 Differential Revision: D92733079
d979975 to
5855b25
Compare
Summary: When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias. It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes) Reviewed By: larryliu0820 Differential Revision: D92733079
Summary: Pull Request resolved: pytorch#17348 When performing QAT with a conv layer (bias=False) followed by batch norm, the fusion process introduces a bias after observers are attached, so the bias remains unquantized. This PR introduces a new pass to find such biases, compute the correct scale from the input and weight dequantize nodes, and insert proper quantize/dequantize nodes for the bias. It operates on aten dialect graphs, supporting both plain GraphModules (get_attr nodes) and ExportedPrograms (placeholder nodes) Differential Revision: D92733079
Summary:
When performing QAT with a model that has a conv layer with no bias followed by batch norm, the fusion process creates a bias. This is done after observers are attached so the resulting bias is kept as float.
This diff adds a pass which grabs the proper qparams and applies them to the non-quantized bias.
Differential Revision: D92733079
cc @robert-kalmar @digantdesai