Skip to content

V2 quantizer: fix IO-boundary shared clusters left in float#20291

Merged
rascani merged 2 commits into
pytorch:mainfrom
rascani:export-D108662081
Jun 17, 2026
Merged

V2 quantizer: fix IO-boundary shared clusters left in float#20291
rascani merged 2 commits into
pytorch:mainfrom
rascani:export-D108662081

Conversation

@rascani

@rascani rascani commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Summary:
Shared-op clusters (e.g. cat, view, reshape) on the quantized IO boundary were silently left in float by the composable TOSA quantizer (_TOSAQuantizerV2), causing them to fall off the Ethos-U integer delegate onto CPU.

SharedQspecQuantizer propagates a qspec only from already-quantized neighbors. A cluster whose only quantized neighbors are a uint8 model input (intentionally skipped by _skip_shared_qspec_from_io to confine uint8 to the IO boundary) and/or an input-state placeholder with no output_qspec had no qspec to propagate, so it was rejected and remained in float.

The fix adds _is_quantized_io_boundary, which detects annotated placeholder/output nodes that signal the cluster is on the quantized data path even when their qspec is filtered. _get_shared_clique now returns a touches_quantized_io flag alongside the usual results. When _annotate_shared_cluster finds an empty adjacent_qspecs but a boundary-touching cluster, it initiates quantization from the global config input-activation qspec instead of rejecting. _TOSAQuantizerV2.set_global now also propagates to shared_qspec_quantizer.global_config so the fallback is wired automatically.

This restores the correctness fix from D107320847, which was abandoned because its other fix (parameter-operand weight misclassification) had already been resolved via the is_weight PARAMETER_TARGETS refactor.

This change was developed with assistance from Claude.

Differential Revision: D108662081

cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell

Summary:
Shared-op clusters (e.g. `cat`, `view`, `reshape`) on the quantized IO boundary were silently left in float by the composable TOSA quantizer (`_TOSAQuantizerV2`), causing them to fall off the Ethos-U integer delegate onto CPU.

`SharedQspecQuantizer` propagates a qspec only from already-quantized neighbors. A cluster whose only quantized neighbors are a uint8 model input (intentionally skipped by `_skip_shared_qspec_from_io` to confine uint8 to the IO boundary) and/or an input-state placeholder with no `output_qspec` had no qspec to propagate, so it was rejected and remained in float.

The fix adds `_is_quantized_io_boundary`, which detects annotated `placeholder`/`output` nodes that signal the cluster is on the quantized data path even when their qspec is filtered. `_get_shared_clique` now returns a `touches_quantized_io` flag alongside the usual results. When `_annotate_shared_cluster` finds an empty `adjacent_qspecs` but a boundary-touching cluster, it initiates quantization from the global config input-activation qspec instead of rejecting. `_TOSAQuantizerV2.set_global` now also propagates to `shared_qspec_quantizer.global_config` so the fallback is wired automatically.

This restores the correctness fix from D107320847, which was abandoned because its other fix (parameter-operand weight misclassification) had already been resolved via the `is_weight` `PARAMETER_TARGETS` refactor.

This change was developed with assistance from Claude.

Differential Revision: D108662081
@rascani rascani requested a review from digantdesai as a code owner June 15, 2026 22:37
@pytorch-bot

pytorch-bot Bot commented Jun 15, 2026

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20291

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 1 Cancelled Job

As of commit c0ac9b6 with merge base e257a71 (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOB - The following job was cancelled. Please retry:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 15, 2026
@meta-codesync

meta-codesync Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

@rascani has exported this pull request. If you are a Meta employee, you can view the originating Diff in D108662081.

@github-actions github-actions Bot added ciflow/trunk module: arm Issues related to arm backend labels Jun 15, 2026
@rascani rascani requested a review from AdrianLundell June 15, 2026 22:38
@github-actions

Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@AdrianLundell AdrianLundell left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

For some context, ideally we would just leave all nodes now handled by the SharedQspecQuantizer un-annotated and just let them be handled by dtype propagation, the reason it is done this way is mainly to ensure we know what nodes are quantized and not at partition-time. If we could do that in a more clever way maybe we could avoid the SharedQspecQuantizer altogether.

@digantdesai digantdesai left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

Runs lintrunner -a on the two files flagged by the Lint check on pytorch#20291
(UFMT import ordering and signature wrapping, DOCFORMATTER docstrings).
Formatting only; no logic changes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@meta-codesync

meta-codesync Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

@rascani has imported this pull request. If you are a Meta employee, you can view this in D108662081.

@rascani rascani merged commit 218cc45 into pytorch:main Jun 17, 2026
488 of 493 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. meta-exported module: arm Issues related to arm backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants