Reuse buffers in op handlers#20524
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20524
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ You can merge normally! (2 Unrelated Failures)As of commit e7b94bf with merge base 7e0151e ( BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
This PR makes the MLX backend's compound op emitters reuse buffers in place so MLX can donate them, rewriting the batch_norm normalize chain and the sample gumbel/top-p chain to thread results through a small set of temp slots (out==in) instead of allocating a fresh temp per step, while keeping separate slots for multi-use values.
An audit of the remaining emitters/handlers (conv, pooling, gated-delta-rule, SDPA, quantized/gguf linear, and the C++ runtime handlers) confirmed they already reuse buffers or are inherently non-donating (views/fused kernels), so no further changes were needed.