Skip to content

Manual merge of PRs #20394–#20397 (slice_copy + permute_copy)#20550

Merged
JulianCloudNTH merged 5 commits into
pytorch:mainfrom
JulianCloudNTH:webgpu-slice-permute-manual-merge
Jun 26, 2026
Merged

Manual merge of PRs #20394–#20397 (slice_copy + permute_copy)#20550
JulianCloudNTH merged 5 commits into
pytorch:mainfrom
JulianCloudNTH:webgpu-slice-permute-manual-merge

Conversation

@JulianCloudNTH

@JulianCloudNTH JulianCloudNTH commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Summary

Manual merge of four WebGPU-delegate op PRs that landed internally but could not auto-merge
to main. These are stacked ghstack PRs — when the lower PRs in the stack merged, their head
branches were deleted and these four PRs' base branches were orphaned, so the orig-PR
proposer failed with 422 base invalid. This PR re-lands the same four commits (identical
content to the originals, flat test layout) as a clean stack on top of current main:

  • #20394 — Add slice_copy op
    (aten.slice_copy.Tensor)
  • #20395slice_copy op test suite
    (cases.py op-test framework)
  • #20396 — Add permute_copy + IntList
    graph support (aten.permute_copy.default)
  • #20397permute_copy op test suite
    (cases.py op-test framework)

Test plan

Each op ships with its cases.py op-test suite (exported via VulkanPartitioner, compared
to a torch golden on Dawn) plus an export-delegation smoke test, exercised by the WebGPU
op-test CI (etvk-*). Verified internally; content is identical to the original four PRs.

@diff-train-skip-merge

Pull Request resolved: pytorch#20394

Adds `aten.slice_copy.Tensor` to the WebGPU delegate as a gather: each output element is mapped back to its source input element along the sliced dim via `start + coord * step`.

Composition (single compute dispatch):
- `runtime/ops/slice/Slice.cpp` — reads `args = [self, dim, start, end, step, out]` via `read_scalar` (static `Int`/`Null`-sentinel default; throws on dynamic `SymInt`); normalizes negative `dim`/`start`, clamps `start` to `[0, in_size]`; builds two `TensorMeta` UBOs + a `SliceParams{dim, start, step}` uniform; guards fp32; dispatches over `compute_1d_workgroup_count(out.numel)` with `override wg_size`; releases all uniforms after the bind group.
- `runtime/ops/slice/slice.wgsl` — delinearizes the output index over the contiguous output strides, maps the sliced-dim coordinate back to the input (`start + coord*step`), relinearizes over the input strides.
ghstack-source-id: 397026527
@exported-using-ghexport

Differential Revision: [D108793168](https://our.internmc.facebook.com/intern/diff/D108793168/)
…work)

Pull Request resolved: pytorch#20395

Registers `aten.slice_copy.Tensor` in the `cases.py` op-test framework: a `_slice_suite` of 4 configs (leading-dim slice `[:,1:5]`, last-dim slice `[...,1:3]`, step-2 `[:,0:8:2]`, negative-end `[:,1:-1]`) that `generate_op_tests` exports via `VulkanPartitioner` and compares to a torch golden on Dawn. Also adds `test/ops/slice/test_slice.py` (`SliceModule` + `CONFIGS` + export-delegation/eager smoke test) and the `aten.slice_copy.Tensor` partitioner-allowlist entry in `tester.py`.
ghstack-source-id: 397026537
@exported-using-ghexport

Differential Revision: [D108793151](https://our.internmc.facebook.com/intern/diff/D108793151/)
…ermute_copy.default)

Pull Request resolved: pytorch#20396

Adds `aten.permute_copy.default` (a coordinate-reorder gather) to the WebGPU delegate, and the `IntList` graph value type it needs to read its `dims` argument.

Composition:
- `runtime/WebGPUGraph.{h,cpp}` — adds `ValueType::IntList` backed by `std::vector<std::vector<int64_t>> int_lists_` + `get_int_list(int)`; `build()` deserializes `vkgraph::GraphTypes::IntList` via `value_as_IntList()->items()` (int64, matching the FlatBuffer `[long]`); mirrors the existing scalar value plumbing.
- `runtime/ops/permute/Permute.cpp` — reads the permutation via `get_int_list`, normalizes negative dims, validates it is a permutation of `[0, ndim)`, builds two `TensorMeta` UBOs + a `PermuteParams{perm: vec4<u32>}` uniform, guards fp32 + rank≤4, dispatches over `compute_1d_workgroup_count(out.numel)` with `override wg_size`; releases all uniforms after the bind group.
- `runtime/ops/permute/permute.wgsl` — delinearizes the output index over the contiguous output strides, reads `input` at `in.strides[perm[d]]` per dim (mirrors Vulkan `permute_buffer.glsl`).
- Registers both `aten.permute_copy.default` and `aten.permute.default` to the same handler.
ghstack-source-id: 397026548
@exported-using-ghexport

Differential Revision: [D108793162](https://our.internmc.facebook.com/intern/diff/D108793162/)
…mework)

Pull Request resolved: pytorch#20397

Registers `aten.permute_copy.default` in the `cases.py` op-test framework: a `_permute_suite` of 4 configs (3D rotation, 4D middle-dim transpose, 2D transpose, full 4D shuffle) that `generate_op_tests` exports via `VulkanPartitioner` and compares to a torch golden on Dawn. Also adds `test/ops/permute/test_permute.py` (`PermuteModule` + `CONFIGS` + `_op_delegated` smoke test) and the `aten.permute_copy.default` partitioner-allowlist entry in `tester.py`.
ghstack-source-id: 397026550
@exported-using-ghexport

Differential Revision: [D108793156](https://our.internmc.facebook.com/intern/diff/D108793156/)
@pytorch-bot

pytorch-bot Bot commented Jun 26, 2026

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20550

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 5 Pending

As of commit dde991a with merge base b919db7 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 26, 2026
@github-actions

Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@psiddh psiddh left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving this to unblock the diff train

@JulianCloudNTH JulianCloudNTH merged commit a03f97b into pytorch:main Jun 26, 2026
181 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants