Skip to content

[PyTorch][torch.compile] Add TensorProto mechanism #8

Open
pggPL wants to merge 5 commits into
make_qunatizers_opaquefrom
tensor_proto_mechanism
Open

[PyTorch][torch.compile] Add TensorProto mechanism #8
pggPL wants to merge 5 commits into
make_qunatizers_opaquefrom
tensor_proto_mechanism

Conversation

@pggPL

@pggPL pggPL commented Jun 6, 2026

Copy link
Copy Markdown
Owner

Description

This PR introduces TensorProto — a data-free prototype of a tensor (or quantized tensor) that captures everything needed to reason about and rebuild a tensor without holding any storage: its logical shape/dtype and, for quantized tensors, the value-opaque quantizer defining the layout.

The key property is that TensorProto.create_tensor() materializes a quantized tensor purely in Python (via Quantizer.alloc_tensors + the storage's __tensor_unflatten__), so it traces under torch.compile(fullgraph=True) with no graph break — unlike make_empty, which goes through the opaque C++ tex.create_empty_quantized_tensor. This is the foundation for writing torch.library custom-op fake implementations of quantized ops.

This builds on the value-opaque quantizer work (so a TensorProto is itself safe to treat as a compile-time constant).

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

  • dynamo.py: Add TensorProto dataclass (shape, dtype, quantizer, requires_grad, device) with is_quantized, inner_names(), create_metadata() and create_tensor(), plus a to_tensor_proto() helper that builds a proto from a plain torch.Tensor or a QuantizedTensorStorage/QuantizedTensor.
  • quantized_tensor.py:
    • Add the PyTorch wrapper-subclass flatten protocol (__tensor_flatten__ / __tensor_unflatten__) to QuantizedTensorStorage, driven by a per-class _FLATTEN_TENSOR_BUFFERS declaration of (attribute_name, constructor_kwarg) pairs.
    • Add a _STORAGE_REGISTRY (populated via __init_subclass__) so __tensor_unflatten__ can resolve a concrete storage/wrapper class from its qualname inside an FX graph.
    • Add pure-Python, traceable allocation hooks to Quantizer: alloc_tensors, create_metadata, and the opt-in overrides _describe_buffers, _storage_scalars, _resolve_storage_cls.
  • Quantizers: Implement the allocation hooks for Float8CurrentScalingQuantizer, MXFP8Quantizer and Float8BlockQuantizer.
  • Storage classes: Declare _FLATTEN_TENSOR_BUFFERS for Float8TensorStorage, MXFP8TensorStorage and Float8BlockwiseQTensorStorage.
  • ops/basic/basic_linear.py: Add allocation-free _functional_forward_fake / _functional_backward_fake that operate on TensorProto and return output/gradient protos, as a basis for custom-op fake impls (single-device only; TP/SP shape effects not yet modeled).
  • Tests: Add tests/pytorch/test_tensor_proto.py (CPU smoke tests for _describe_buffers/alloc_tensors/create_metadata, flatten round-trip, and to_tensor_proto) and torch.compile fullgraph tests in test_torch_compile.py.

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@pggPL pggPL force-pushed the make_qunatizers_opaque branch from 33e9d73 to d341eeb Compare June 16, 2026 15:21
@pggPL pggPL requested a review from cyanguwa as a code owner June 16, 2026 15:21
pggPL added a commit that referenced this pull request Jun 16, 2026
Squashed PR #8 (tensor_proto_mechanism) onto the rebased base. Adds TensorProto
(pure-Python, torch.compile-traceable quantized-tensor allocation via
Quantizer.alloc_tensors + storage __tensor_flatten__/__tensor_unflatten__),
Linear fake fwd/bwd impls for the custom-op path, and tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
@pggPL pggPL force-pushed the tensor_proto_mechanism branch from 2cccc30 to 2e252f9 Compare June 16, 2026 15:31
pggPL added a commit that referenced this pull request Jun 16, 2026
Squashed PR #8 (tensor_proto_mechanism) onto the rebased base. Adds TensorProto
(pure-Python, torch.compile-traceable quantized-tensor allocation via
Quantizer.alloc_tensors + storage __tensor_flatten__/__tensor_unflatten__),
Linear fake fwd/bwd impls for the custom-op path, and tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
@pggPL pggPL force-pushed the tensor_proto_mechanism branch from 2e252f9 to ba92f5b Compare June 16, 2026 16:05
pggPL added a commit that referenced this pull request Jun 16, 2026
Squashed PR #8 (tensor_proto_mechanism) onto the rebased base. Adds TensorProto
(pure-Python, torch.compile-traceable quantized-tensor allocation via
Quantizer.alloc_tensors + storage __tensor_flatten__/__tensor_unflatten__),
Linear fake fwd/bwd impls for the custom-op path, and tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
@pggPL pggPL force-pushed the tensor_proto_mechanism branch from ba92f5b to b1273ea Compare June 16, 2026 16:12

@kshitij12345 kshitij12345 left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good

Would it be possible to reduce duplication between _linear_forward_impl_fake and _linear_forward_impl.

Comment thread transformer_engine/pytorch/dynamo/tensor_proto.py Outdated
Comment thread tests/pytorch/test_torch_compile.py Outdated
@pggPL pggPL force-pushed the make_qunatizers_opaque branch from e4a879b to adc65f6 Compare June 29, 2026 07:14
pggPL added a commit that referenced this pull request Jun 29, 2026
Squashed PR #8 (tensor_proto_mechanism) onto the rebased base. Adds TensorProto
(pure-Python, torch.compile-traceable quantized-tensor allocation via
Quantizer.alloc_tensors + storage __tensor_flatten__/__tensor_unflatten__),
Linear fake fwd/bwd impls for the custom-op path, and tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
@pggPL pggPL force-pushed the tensor_proto_mechanism branch from 85355a6 to c1e40b2 Compare June 29, 2026 07:16
@pggPL pggPL force-pushed the make_qunatizers_opaque branch from adc65f6 to c7bbc83 Compare June 29, 2026 07:33
pggPL added a commit that referenced this pull request Jun 29, 2026
Squashed PR #8 (tensor_proto_mechanism) onto the rebased base. Adds TensorProto
(pure-Python, torch.compile-traceable quantized-tensor allocation via
Quantizer.alloc_tensors + storage __tensor_flatten__/__tensor_unflatten__),
Linear fake fwd/bwd impls for the custom-op path, and tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
@pggPL pggPL force-pushed the tensor_proto_mechanism branch from c1e40b2 to e760487 Compare June 29, 2026 07:34
@pggPL pggPL force-pushed the make_qunatizers_opaque branch from c7bbc83 to f592cbb Compare June 29, 2026 09:26
pggPL added a commit that referenced this pull request Jun 29, 2026
Squashed PR #8 (tensor_proto_mechanism) onto the rebased base. Adds TensorProto
(pure-Python, torch.compile-traceable quantized-tensor allocation via
Quantizer.alloc_tensors + storage __tensor_flatten__/__tensor_unflatten__),
Linear fake fwd/bwd impls for the custom-op path, and tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
@pggPL pggPL force-pushed the tensor_proto_mechanism branch from e760487 to 50d5c21 Compare June 29, 2026 09:26
@pggPL pggPL force-pushed the make_qunatizers_opaque branch from f592cbb to 945f62d Compare June 29, 2026 09:34
pggPL added a commit that referenced this pull request Jun 29, 2026
Squashed PR #8 (tensor_proto_mechanism) onto the rebased base. Adds TensorProto
(pure-Python, torch.compile-traceable quantized-tensor allocation via
Quantizer.alloc_tensors + storage __tensor_flatten__/__tensor_unflatten__),
Linear fake fwd/bwd impls for the custom-op path, and tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
@pggPL pggPL force-pushed the tensor_proto_mechanism branch from 50d5c21 to da709e7 Compare June 29, 2026 09:35
pggPL added a commit that referenced this pull request Jun 29, 2026
Squashed PR #8 (tensor_proto_mechanism) onto the rebased base. Adds TensorProto
(pure-Python, torch.compile-traceable quantized-tensor allocation via
Quantizer.alloc_tensors + storage __tensor_flatten__/__tensor_unflatten__),
Linear fake fwd/bwd impls for the custom-op path, and tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
@pggPL pggPL force-pushed the tensor_proto_mechanism branch 3 times, most recently from 5131ebc to 77831be Compare June 29, 2026 10:24
pggPL added a commit that referenced this pull request Jun 29, 2026
Squashed PR #8 (tensor_proto_mechanism) onto the rebased base. Adds TensorProto
(pure-Python, torch.compile-traceable quantized-tensor allocation via
Quantizer.alloc_tensors + storage __tensor_flatten__/__tensor_unflatten__),
Linear fake fwd/bwd impls for the custom-op path, and tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
@pggPL pggPL force-pushed the tensor_proto_mechanism branch from 77831be to 29e5245 Compare June 29, 2026 12:47
pggPL added a commit that referenced this pull request Jun 29, 2026
Squashed PR #8 (tensor_proto_mechanism) onto the rebased base. Adds TensorProto
(pure-Python, torch.compile-traceable quantized-tensor allocation via
Quantizer.alloc_tensors + storage __tensor_flatten__/__tensor_unflatten__),
Linear fake fwd/bwd impls for the custom-op path, and tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
@pggPL pggPL force-pushed the tensor_proto_mechanism branch from 29e5245 to 99c1377 Compare June 29, 2026 13:10
pggPL added a commit that referenced this pull request Jun 29, 2026
Squashed PR #8 (tensor_proto_mechanism) onto the rebased base. Adds TensorProto
(pure-Python, torch.compile-traceable quantized-tensor allocation via
Quantizer.alloc_tensors + storage __tensor_flatten__/__tensor_unflatten__),
Linear fake fwd/bwd impls for the custom-op path, and tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
@pggPL pggPL force-pushed the tensor_proto_mechanism branch from 99c1377 to afa86ff Compare June 29, 2026 13:29
pggPL added a commit that referenced this pull request Jun 29, 2026
Squashed PR #8 (tensor_proto_mechanism) onto the rebased base. Adds TensorProto
(pure-Python, torch.compile-traceable quantized-tensor allocation via
Quantizer.alloc_tensors + storage __tensor_flatten__/__tensor_unflatten__),
Linear fake fwd/bwd impls for the custom-op path, and tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
@pggPL pggPL force-pushed the tensor_proto_mechanism branch from afa86ff to 9e78a6c Compare June 29, 2026 13:30
pggPL and others added 4 commits June 29, 2026 15:45
Squashed PR #8 (tensor_proto_mechanism) onto the rebased base. Adds TensorProto
(pure-Python, torch.compile-traceable quantized-tensor allocation via
Quantizer.alloc_tensors + storage __tensor_flatten__/__tensor_unflatten__),
Linear fake fwd/bwd impls for the custom-op path, and tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
The cached FP8 weight is the same tensor returned as new_weight_workspace (cache miss) or passed in as weight_workspace (cache hit). A custom op may not return a tensor that aliases an input or another return, so mark those slots and reconstruct wt_save in _linear_setup_ctx instead of saving it twice. Mirrored in the fake impl so the saved-slot layout matches.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
NVFP4Quantizer._describe_buffers grouped each amax right after its scale (per-usage), diverging from NVFP4TensorStorage._FLATTEN_TENSOR_BUFFERS (amax buffers last). The order is functionally irrelevant (buffers are consumed by name in alloc_tensors and reordered in TensorProto.inner_names), but aligning it makes describe/flatten agree and fixes test_to_tensor_proto_quantized[nvfp4].

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
…upport

- TensorProto.inner_names now raises if the quantizer describes buffer(s) absent
  from the storage's _FLATTEN_TENSOR_BUFFERS, instead of silently appending them.
- Gate the nvfp4 proto-quantizer param on nvfp4_available so it skips on hardware
  without NVFP4 support rather than failing.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
@pggPL pggPL force-pushed the tensor_proto_mechanism branch from 9e78a6c to 50c11cd Compare June 29, 2026 13:46
…escribe_buffers

Access NVFP4Quantizer @staticmethods (convert_shape_for_fp4, get_columnwise_shape)
via the class instead of the instance. Under torch.compile, instance access of a
@staticmethod on a value-opaque object crashes Dynamo guard generation with
"'function' object has no attribute '__func__'" (pytorch/pytorch#182741).
Temporary workaround until the PyTorch-side fix lands.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants