[ExecuTorch][WebGPU] Add et_vk.prepack (constant-tensor packing) for E2E weight loading by pytorchbot · Pull Request #20429 · pytorch/executorch

pytorchbot · 2026-06-22T16:14:00Z

This PR was created by the merge bot to help merge the original PR into the main branch.
ghstack PR number: #20265 by @JulianCloudNTH
^ Please use this as the source of truth for the PR details, comments, and reviews
ghstack PR base: https://github.com/pytorch/executorch/tree/gh/JulianCloudNTH/27/base
ghstack PR head: https://github.com/pytorch/executorch/tree/gh/JulianCloudNTH/27/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/gh/JulianCloudNTH/29/orig
Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/JulianCloudNTH/27/orig

@diff-train-skip-merge

…E2E weight loading Pull Request resolved: #20265 Adds the WebGPU backend handler for `et_vk.prepack.default`, the node the VulkanPartitioner wraps around every constant feeding a delegated op so the constant is materialized into its dedicated GPU buffer before inference. For the WebGPU backend's buffer-flat/fp32 model, prepack is an identity layout (same dims, dtype, and bytes), so the handler runs no compute shader: it validates that `src` and `out` match (dims, `elem_size`, `nbytes`, non-null buffers; every check throws fail-loud) and records a one-time `src`->`out` buffer-to-buffer copy via the new `WebGPUGraph::add_prepack_copy`. The recorded copies run once in a new `build()` Phase 4 (after the op-dispatch chain is recorded), mirroring the Vulkan delegate's separate `prepack()` init phase (distinct from per-inference `execute()`). Ordering is guaranteed by the WebGPU queue -- the prepack submit precedes the first `execute()` submit on the same queue, so the copied data is visible without an explicit device poll (Dawn has no `wgpuDevicePoll`, and the backend relies on queue ordering plus the output-map wait elsewhere). `src.elem_size` is the `WebGPUTensor` field added by the embedding op lower in this stack, so prepack stacks above it. ghstack-source-id: 395549289 @exported-using-ghexport Differential Revision: [D108428754](https://our.internmc.facebook.com/intern/diff/D108428754/)

pytorch-bot · 2026-06-22T16:14:04Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20429

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[ROCm] MI350 CI jobs will have longer queue times due to CI migration

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Pull Request resolved: #20292 Test suite for the `et_vk.prepack` constant-materialization op, split into its own diff (op below, tests above) per the per-op test-split convention. The prepack op is how a serialized constant becomes a GPU tensor: the constant arrives as a CPU-side reference (sizes + a pointer into the .pte bytes), and the prepack node is the sole materialization — one CPU->GPU transfer straight into the consumer's buffer. The model `M(x) = x + w` (w a constant) routes `w` through a prepack node, so the delegate must run the materialization for the output to equal `x + w` rather than `x + 0`. ghstack-source-id: 395555139 @exported-using-ghexport Differential Revision: [D108678631](https://our.internmc.facebook.com/intern/diff/D108678631/)

pytorchbot requested review from kirklandsign and larryliu0820 as code owners June 22, 2026 16:14

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 22, 2026

pytorchbot temporarily deployed to cadence June 22, 2026 16:14 — with GitHub Actions Inactive

JulianCloudNTH self-requested a review June 22, 2026 16:55

JulianCloudNTH approved these changes Jun 22, 2026

View reviewed changes

JulianCloudNTH temporarily deployed to cadence June 22, 2026 18:38 — with GitHub Actions Inactive

JulianCloudNTH merged commit 2bbe265 into gh/JulianCloudNTH/29/orig Jun 22, 2026
161 of 168 checks passed

JulianCloudNTH deleted the gh/JulianCloudNTH/27/orig branch June 22, 2026 18:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ExecuTorch][WebGPU] Add et_vk.prepack (constant-tensor packing) for E2E weight loading#20429

[ExecuTorch][WebGPU] Add et_vk.prepack (constant-tensor packing) for E2E weight loading#20429
JulianCloudNTH merged 2 commits into
gh/JulianCloudNTH/29/origfrom
gh/JulianCloudNTH/27/orig

pytorchbot commented Jun 22, 2026

Uh oh!

pytorch-bot Bot commented Jun 22, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

pytorchbot commented Jun 22, 2026

Uh oh!

pytorch-bot Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20429

❗ 1 Active SEVs

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pytorch-bot Bot commented Jun 22, 2026 •

edited

Loading