[PyTorch][torch.compile] Add TensorProto mechanism #8
Open
pggPL wants to merge 5 commits into
Open
Conversation
33e9d73 to
d341eeb
Compare
pggPL
added a commit
that referenced
this pull request
Jun 16, 2026
Squashed PR #8 (tensor_proto_mechanism) onto the rebased base. Adds TensorProto (pure-Python, torch.compile-traceable quantized-tensor allocation via Quantizer.alloc_tensors + storage __tensor_flatten__/__tensor_unflatten__), Linear fake fwd/bwd impls for the custom-op path, and tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
2cccc30 to
2e252f9
Compare
pggPL
added a commit
that referenced
this pull request
Jun 16, 2026
Squashed PR #8 (tensor_proto_mechanism) onto the rebased base. Adds TensorProto (pure-Python, torch.compile-traceable quantized-tensor allocation via Quantizer.alloc_tensors + storage __tensor_flatten__/__tensor_unflatten__), Linear fake fwd/bwd impls for the custom-op path, and tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
2e252f9 to
ba92f5b
Compare
pggPL
added a commit
that referenced
this pull request
Jun 16, 2026
Squashed PR #8 (tensor_proto_mechanism) onto the rebased base. Adds TensorProto (pure-Python, torch.compile-traceable quantized-tensor allocation via Quantizer.alloc_tensors + storage __tensor_flatten__/__tensor_unflatten__), Linear fake fwd/bwd impls for the custom-op path, and tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
ba92f5b to
b1273ea
Compare
kshitij12345
left a comment
There was a problem hiding this comment.
Overall looks good
Would it be possible to reduce duplication between _linear_forward_impl_fake and _linear_forward_impl.
e4a879b to
adc65f6
Compare
pggPL
added a commit
that referenced
this pull request
Jun 29, 2026
Squashed PR #8 (tensor_proto_mechanism) onto the rebased base. Adds TensorProto (pure-Python, torch.compile-traceable quantized-tensor allocation via Quantizer.alloc_tensors + storage __tensor_flatten__/__tensor_unflatten__), Linear fake fwd/bwd impls for the custom-op path, and tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
85355a6 to
c1e40b2
Compare
adc65f6 to
c7bbc83
Compare
pggPL
added a commit
that referenced
this pull request
Jun 29, 2026
Squashed PR #8 (tensor_proto_mechanism) onto the rebased base. Adds TensorProto (pure-Python, torch.compile-traceable quantized-tensor allocation via Quantizer.alloc_tensors + storage __tensor_flatten__/__tensor_unflatten__), Linear fake fwd/bwd impls for the custom-op path, and tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
c1e40b2 to
e760487
Compare
c7bbc83 to
f592cbb
Compare
pggPL
added a commit
that referenced
this pull request
Jun 29, 2026
Squashed PR #8 (tensor_proto_mechanism) onto the rebased base. Adds TensorProto (pure-Python, torch.compile-traceable quantized-tensor allocation via Quantizer.alloc_tensors + storage __tensor_flatten__/__tensor_unflatten__), Linear fake fwd/bwd impls for the custom-op path, and tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
e760487 to
50d5c21
Compare
f592cbb to
945f62d
Compare
pggPL
added a commit
that referenced
this pull request
Jun 29, 2026
Squashed PR #8 (tensor_proto_mechanism) onto the rebased base. Adds TensorProto (pure-Python, torch.compile-traceable quantized-tensor allocation via Quantizer.alloc_tensors + storage __tensor_flatten__/__tensor_unflatten__), Linear fake fwd/bwd impls for the custom-op path, and tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
50d5c21 to
da709e7
Compare
pggPL
added a commit
that referenced
this pull request
Jun 29, 2026
Squashed PR #8 (tensor_proto_mechanism) onto the rebased base. Adds TensorProto (pure-Python, torch.compile-traceable quantized-tensor allocation via Quantizer.alloc_tensors + storage __tensor_flatten__/__tensor_unflatten__), Linear fake fwd/bwd impls for the custom-op path, and tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
5131ebc to
77831be
Compare
pggPL
added a commit
that referenced
this pull request
Jun 29, 2026
Squashed PR #8 (tensor_proto_mechanism) onto the rebased base. Adds TensorProto (pure-Python, torch.compile-traceable quantized-tensor allocation via Quantizer.alloc_tensors + storage __tensor_flatten__/__tensor_unflatten__), Linear fake fwd/bwd impls for the custom-op path, and tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
77831be to
29e5245
Compare
pggPL
added a commit
that referenced
this pull request
Jun 29, 2026
Squashed PR #8 (tensor_proto_mechanism) onto the rebased base. Adds TensorProto (pure-Python, torch.compile-traceable quantized-tensor allocation via Quantizer.alloc_tensors + storage __tensor_flatten__/__tensor_unflatten__), Linear fake fwd/bwd impls for the custom-op path, and tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
29e5245 to
99c1377
Compare
pggPL
added a commit
that referenced
this pull request
Jun 29, 2026
Squashed PR #8 (tensor_proto_mechanism) onto the rebased base. Adds TensorProto (pure-Python, torch.compile-traceable quantized-tensor allocation via Quantizer.alloc_tensors + storage __tensor_flatten__/__tensor_unflatten__), Linear fake fwd/bwd impls for the custom-op path, and tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
99c1377 to
afa86ff
Compare
pggPL
added a commit
that referenced
this pull request
Jun 29, 2026
Squashed PR #8 (tensor_proto_mechanism) onto the rebased base. Adds TensorProto (pure-Python, torch.compile-traceable quantized-tensor allocation via Quantizer.alloc_tensors + storage __tensor_flatten__/__tensor_unflatten__), Linear fake fwd/bwd impls for the custom-op path, and tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
afa86ff to
9e78a6c
Compare
Squashed PR #8 (tensor_proto_mechanism) onto the rebased base. Adds TensorProto (pure-Python, torch.compile-traceable quantized-tensor allocation via Quantizer.alloc_tensors + storage __tensor_flatten__/__tensor_unflatten__), Linear fake fwd/bwd impls for the custom-op path, and tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
The cached FP8 weight is the same tensor returned as new_weight_workspace (cache miss) or passed in as weight_workspace (cache hit). A custom op may not return a tensor that aliases an input or another return, so mark those slots and reconstruct wt_save in _linear_setup_ctx instead of saving it twice. Mirrored in the fake impl so the saved-slot layout matches. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
NVFP4Quantizer._describe_buffers grouped each amax right after its scale (per-usage), diverging from NVFP4TensorStorage._FLATTEN_TENSOR_BUFFERS (amax buffers last). The order is functionally irrelevant (buffers are consumed by name in alloc_tensors and reordered in TensorProto.inner_names), but aligning it makes describe/flatten agree and fixes test_to_tensor_proto_quantized[nvfp4]. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
…upport - TensorProto.inner_names now raises if the quantizer describes buffer(s) absent from the storage's _FLATTEN_TENSOR_BUFFERS, instead of silently appending them. - Gate the nvfp4 proto-quantizer param on nvfp4_available so it skips on hardware without NVFP4 support rather than failing. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
9e78a6c to
50c11cd
Compare
…escribe_buffers Access NVFP4Quantizer @staticmethods (convert_shape_for_fp4, get_columnwise_shape) via the class instead of the instance. Under torch.compile, instance access of a @staticmethod on a value-opaque object crashes Dynamo guard generation with "'function' object has no attribute '__func__'" (pytorch/pytorch#182741). Temporary workaround until the PyTorch-side fix lands. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR introduces
TensorProto— a data-free prototype of a tensor (or quantized tensor) that captures everything needed to reason about and rebuild a tensor without holding any storage: its logicalshape/dtypeand, for quantized tensors, the value-opaquequantizerdefining the layout.The key property is that
TensorProto.create_tensor()materializes a quantized tensor purely in Python (viaQuantizer.alloc_tensors+ the storage's__tensor_unflatten__), so it traces undertorch.compile(fullgraph=True)with no graph break — unlikemake_empty, which goes through the opaque C++tex.create_empty_quantized_tensor. This is the foundation for writingtorch.librarycustom-op fake implementations of quantized ops.This builds on the value-opaque quantizer work (so a
TensorProtois itself safe to treat as a compile-time constant).Type of change
Changes
dynamo.py: AddTensorProtodataclass (shape,dtype,quantizer,requires_grad,device) withis_quantized,inner_names(),create_metadata()andcreate_tensor(), plus ato_tensor_proto()helper that builds a proto from a plaintorch.Tensoror aQuantizedTensorStorage/QuantizedTensor.quantized_tensor.py:__tensor_flatten__/__tensor_unflatten__) toQuantizedTensorStorage, driven by a per-class_FLATTEN_TENSOR_BUFFERSdeclaration of(attribute_name, constructor_kwarg)pairs._STORAGE_REGISTRY(populated via__init_subclass__) so__tensor_unflatten__can resolve a concrete storage/wrapper class from its qualname inside an FX graph.Quantizer:alloc_tensors,create_metadata, and the opt-in overrides_describe_buffers,_storage_scalars,_resolve_storage_cls.Float8CurrentScalingQuantizer,MXFP8QuantizerandFloat8BlockQuantizer._FLATTEN_TENSOR_BUFFERSforFloat8TensorStorage,MXFP8TensorStorageandFloat8BlockwiseQTensorStorage.ops/basic/basic_linear.py: Add allocation-free_functional_forward_fake/_functional_backward_fakethat operate onTensorProtoand return output/gradient protos, as a basis for custom-op fake impls (single-device only; TP/SP shape effects not yet modeled).tests/pytorch/test_tensor_proto.py(CPU smoke tests for_describe_buffers/alloc_tensors/create_metadata, flatten round-trip, andto_tensor_proto) andtorch.compilefullgraph tests intest_torch_compile.py.Checklist: