[PyTorch][torch.compile] Replace TensorProto with make_empty_traceable#13
Open
kshitij12345 wants to merge 3 commits into
Open
[PyTorch][torch.compile] Replace TensorProto with make_empty_traceable#13kshitij12345 wants to merge 3 commits into
kshitij12345 wants to merge 3 commits into
Conversation
bda5d6c to
8877da0
Compare
8877da0 to
3bd5ab5
Compare
Replace the 172-line TensorProto dataclass with a single function make_empty_traceable(quantizer, shape, dtype, device) that directly allocates traceable quantized tensors. The fake impls now return actual tensors (which become FakeTensors under register_fake) instead of intermediate descriptors. The key insight: make_empty_traceable stashes _te_flat_names and _te_flat_ctx on the resulting tensor. Dynamo treats non-callable attributes on traceable wrapper subclasses as constant metadata, so forward_fn can read slot counts and reassembly info from these attributes without calling __tensor_flatten__ (which would cause a graph break since it returns non-Tensor Python objects). This eliminates: - TensorProto class and to_tensor_proto helper (tensor_proto.py deleted) - _proto_view (converted tensor fields to TensorProto before fake impls) - _tensor_field_names (identified fields for _proto_view) - _proto_slot_count / _proto_reassemble (operated on TensorProto objects) - TensorProto branch in _value_to_flat_tensors and _format_bwd_result The fake impls in linear.py now use: - isinstance(inp, QuantizedTensorStorage) instead of inp.is_quantized - weight._quantizer instead of weight.quantizer (TensorProto field) - make_empty_traceable(...) instead of TensorProto(...) - Direct set_usage on quantizer instead of proto.update_usage() Test Plan: ``` python -m pytest tests/pytorch/test_torch_compile.py -v -k 'not nvfp4' ``` Authored with Claude.
3bd5ab5 to
b24d259
Compare
kshitij12345
commented
Jul 2, 2026
| # TODO: understand why Dynamo does not recognize the quantizer retrieved via | ||
| # t._quantizer as the same value-opaque type it would if captured from a | ||
| # closure. If that is fixed upstream, the stashed attributes become | ||
| # unnecessary and we could compute slot counts directly from the quantizer. |
kshitij12345
commented
Jul 2, 2026
| quantizer.optimize_for_gemm = self.optimize_for_gemm | ||
| quantizer.rht_matrix = self.rht_matrix | ||
| quantizer.rht_matrix_random_sign_mask_t = self.rht_matrix_random_sign_mask_t | ||
| if not torch.compiler.is_compiling(): |
Author
There was a problem hiding this comment.
TODO: Understand this better
Author
There was a problem hiding this comment.
This happens due to NVFPQuantizer being an OpaqueObject but then we try to attach a Tensor onto it.
…der is_compiling Under Dynamo tracing, rht_matrix is a FakeTensor attached to an opaque script object. Accessing it in copy() triggers SourcelessBuilder which cannot wrap FakeTensor, causing an InternalTorchDynamoError. The fake impl never runs real quantization, so rht_matrix is unnecessary during tracing. Guard the tensor field copies with torch.compiler.is_compiling() -- the matrix will be rebuilt lazily via _rebuild_derived_state if the quantizer is later used outside tracing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
f506cb1 to
c33cd00
Compare
The C++ quantize kernel requires with_post_rht_amax=True when with_rht is enabled. The test factory was creating an NVFP4Quantizer with with_rht=True but with_post_rht_amax defaulting to False, causing 'Pre-RHT amax is not supported yet' at quantize time. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Replace the 172-line TensorProto dataclass with a single function make_empty_traceable(quantizer, shape, dtype, device) that directly allocates traceable quantized tensors. The fake impls now return actual tensors (which become FakeTensors under register_fake) instead of intermediate descriptors.
The key insight: make_empty_traceable stashes _te_flat_names and _te_flat_ctx on the resulting tensor. Dynamo treats non-callable attributes on traceable wrapper subclasses as constant metadata, so forward_fn can read slot counts and reassembly info from these attributes without calling tensor_flatten (which would cause a graph break since it returns non-Tensor Python objects).
This eliminates:
The fake impls in linear.py now use:
Test Plan:
Authored with Claude.
Description
Please include a brief summary of the changes, relevant motivation and context.
Fixes # (issue)
Type of change
Changes
Please list the changes introduced in this PR:
Checklist: