V2 quantizer: fix IO-boundary shared clusters left in float#20291
Conversation
Summary: Shared-op clusters (e.g. `cat`, `view`, `reshape`) on the quantized IO boundary were silently left in float by the composable TOSA quantizer (`_TOSAQuantizerV2`), causing them to fall off the Ethos-U integer delegate onto CPU. `SharedQspecQuantizer` propagates a qspec only from already-quantized neighbors. A cluster whose only quantized neighbors are a uint8 model input (intentionally skipped by `_skip_shared_qspec_from_io` to confine uint8 to the IO boundary) and/or an input-state placeholder with no `output_qspec` had no qspec to propagate, so it was rejected and remained in float. The fix adds `_is_quantized_io_boundary`, which detects annotated `placeholder`/`output` nodes that signal the cluster is on the quantized data path even when their qspec is filtered. `_get_shared_clique` now returns a `touches_quantized_io` flag alongside the usual results. When `_annotate_shared_cluster` finds an empty `adjacent_qspecs` but a boundary-touching cluster, it initiates quantization from the global config input-activation qspec instead of rejecting. `_TOSAQuantizerV2.set_global` now also propagates to `shared_qspec_quantizer.global_config` so the fallback is wired automatically. This restores the correctness fix from D107320847, which was abandoned because its other fix (parameter-operand weight misclassification) had already been resolved via the `is_weight` `PARAMETER_TARGETS` refactor. This change was developed with assistance from Claude. Differential Revision: D108662081
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20291
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New Failures, 1 Cancelled JobAs of commit c0ac9b6 with merge base e257a71 ( NEW FAILURES - The following jobs have failed:
CANCELLED JOB - The following job was cancelled. Please retry:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@rascani has exported this pull request. If you are a Meta employee, you can view the originating Diff in D108662081. |
This PR needs a
|
AdrianLundell
left a comment
There was a problem hiding this comment.
Thanks for the fix!
For some context, ideally we would just leave all nodes now handled by the SharedQspecQuantizer un-annotated and just let them be handled by dtype propagation, the reason it is done this way is mainly to ensure we know what nodes are quantized and not at partition-time. If we could do that in a more clever way maybe we could avoid the SharedQspecQuantizer altogether.
Runs lintrunner -a on the two files flagged by the Lint check on pytorch#20291 (UFMT import ordering and signature wrapping, DOCFORMATTER docstrings). Formatting only; no logic changes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@rascani has imported this pull request. If you are a Meta employee, you can view this in D108662081. |
Summary:
Shared-op clusters (e.g.
cat,view,reshape) on the quantized IO boundary were silently left in float by the composable TOSA quantizer (_TOSAQuantizerV2), causing them to fall off the Ethos-U integer delegate onto CPU.SharedQspecQuantizerpropagates a qspec only from already-quantized neighbors. A cluster whose only quantized neighbors are a uint8 model input (intentionally skipped by_skip_shared_qspec_from_ioto confine uint8 to the IO boundary) and/or an input-state placeholder with nooutput_qspechad no qspec to propagate, so it was rejected and remained in float.The fix adds
_is_quantized_io_boundary, which detects annotatedplaceholder/outputnodes that signal the cluster is on the quantized data path even when their qspec is filtered._get_shared_cliquenow returns atouches_quantized_ioflag alongside the usual results. When_annotate_shared_clusterfinds an emptyadjacent_qspecsbut a boundary-touching cluster, it initiates quantization from the global config input-activation qspec instead of rejecting._TOSAQuantizerV2.set_globalnow also propagates toshared_qspec_quantizer.global_configso the fallback is wired automatically.This restores the correctness fix from D107320847, which was abandoned because its other fix (parameter-operand weight misclassification) had already been resolved via the
is_weightPARAMETER_TARGETSrefactor.This change was developed with assistance from Claude.
Differential Revision: D108662081
cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell