Optimize FFT logic for complex strided input to avoid oversized memory allocation by vlad-perevezentsev · Pull Request #2939 · IntelPython/dpnp

vlad-perevezentsev · 2026-06-01T10:55:12Z

This PR fixes an issue discovered while implementing #2927 where several FFT tests started failing after adding validation for stride configurations with oversized memory footprints.

The problem was that FFT could allocate output arrays using the same strided layout as a non-contiguous input. For inputs such as a[::2, this resulted in oversized allocations followed by an additional copy to a contiguous array.

This fix checks whether the memory footprint implied by the input strides exceeds the number of elements in the array. If so, the input is copied to a contiguous layout before configuring the FFT descriptor allowing oneMKL FFT to produce a contiguous output directly

By avoiding oversized allocations and the extra copy this significantly improves the performance of all FFT operations in dpnp with complex strided inputs

Additionally, the cuFFT C-contiguous requirement is now enforced for all code paths. Previously it was only applied within batch_fft branch which was a latent bug for 1D non-batch inputs on CUDA.

Have you provided a meaningful PR description?
Have you added a test, reproducer or referred to an issue with a reproducer?
Have you tested your changes locally for CPU and GPU devices?
Have you made sure that new changes do not introduce compiler warnings?
Have you checked performance impact of proposed changes?
Have you added documentation for your changes, if necessary?
Have you added your changes to the changelog?

github-actions · 2026-06-01T11:52:34Z

View rendered docs @ https://intelpython.github.io/dpnp/pull/2939/index.html

coveralls · 2026-06-01T12:20:46Z

coverage: 78.095% (+0.003%) from 78.092% — fix_fft_logic into master

github-actions · 2026-06-04T12:42:21Z

Array API standard conformance tests for dpnp=0.21.0dev1=py313h509198e_17 ran successfully.
Passed: 1355
Failed: 5
Skipped: 16

antonwolfy · 2026-06-12T12:25:44Z

+            _shape = a.shape
+            # Max element displacement reachable by the strides.
+            # Negative strides are handled by _copy_array, so only
+            # positive strides are possible here


There is another regression also, which can be handled separately:

import dpnp as np a = np.arange(4, dtype='c8') b = np.broadcast_to(a, (3, 4)) np.fft.fft(b, axis=0) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[5], line 1 ----> 1 np.fft.fft(b, axis=0) File ~/code/dpnp/dpnp/fft/dpnp_iface_fft.py:122, in fft(a, n, axis, norm, out) 47 """ 48 Compute the one-dimensional discrete Fourier Transform. 49 (...) 118 119 """ 121 dpnp.check_supported_arrays_type(a) --> 122 return dpnp_fft( 123 a, forward=True, real=False, n=n, axis=axis, norm=norm, out=out 124 ) File ~/code/dpnp/dpnp/fft/dpnp_utils_fft.py:664, in dpnp_fft(a, forward, real, n, axis, norm, out) 658 if c2r: 659 # input array should be Hermitian for c2r FFT 660 a = _make_array_hermitian( 661 a, axis, dpnp.are_same_logical_tensors(a, a_orig) 662 ) --> 664 return _fft( 665 a, 666 norm=norm, 667 out=out, 668 forward=forward, 669 # TODO: currently in-place is only implemented for c2c, see SAT-7154 670 in_place=in_place and c2c, 671 c2c=c2c, 672 axes=axis, 673 batch_fft=a_ndim != 1, 674 ) File ~/code/dpnp/dpnp/fft/dpnp_utils_fft.py:447, in _fft(a, norm, out, forward, in_place, c2c, axes, batch_fft) 443 a_strides = _standardize_strides_to_nonzero(strides, a.shape) 444 dsc, out_strides = _commit_descriptor( 445 a, forward, in_place, c2c, a_strides, index, batch_fft 446 ) --> 447 res = _compute_result(dsc, a, out, forward, c2c, out_strides) 448 res = _scale_result(res, a.shape, norm, forward, index) 450 # Revert swapped axes File ~/code/dpnp/dpnp/fft/dpnp_utils_fft.py:239, in _compute_result(dsc, a, out, forward, c2c, out_strides) 231 result = dpnp_array( 232 out_shape, 233 dtype=out_dtype, (...) 236 sycl_queue=exec_q, 237 ) 238 res_usm = result.get_array() --> 239 ht_fft_event, fft_event = fi._fft_out_of_place( 240 dsc, a_usm, res_usm, forward, depends=dep_evs 241 ) 242 _manager.add_event_pair(ht_fft_event, fft_event) 244 if not isinstance(result, dpnp_array): ValueError: Memory addressed by the output array is not sufficiently ample.

That is a great catch!
I will address it in the following PR
Thank you!

antonwolfy

Thank you @vlad-perevezentsev, LGTM with small nits

antonwolfy · 2026-06-19T09:15:04Z

+            # for the output array and unnecessary copy to contiguous
+            # after oneMKL FFT
+            elem_strides = dpnp.get_usm_ndarray(a).strides
+            _shape = a.shape


I'd rename and move away above the if branch, since the same is used at lines 439, 444 and so on below

Suggested change

_shape = a.shape

a_shape = a.shape

antonwolfy · 2026-06-19T09:15:58Z

+            # If so, copy to contiguous to avoid oversized allocation
+            # for the output array and unnecessary copy to contiguous
+            # after oneMKL FFT
+            elem_strides = dpnp.get_usm_ndarray(a).strides


We have exactly the same call at line 443, it'd be better to move above the if-branch to call once.

vlad-perevezentsev added 2 commits June 1, 2026 03:26

Optimize FFT for strided input to avoid oversized allocation

3682370

Merge master into fix_fft_logic

881cf69

vlad-perevezentsev added this to the 0.21.0 release milestone Jun 1, 2026

vlad-perevezentsev self-assigned this Jun 1, 2026

vlad-perevezentsev requested review from antonwolfy and ndgrigorian as code owners June 1, 2026 10:55

vlad-perevezentsev changed the title ~~Optimize FFT logic for strided input to avoid oversized memory allocation~~ Optimize FFT logic for complex strided input to avoid oversized memory allocation Jun 1, 2026

Update changelog

fbb13bd

vlad-perevezentsev mentioned this pull request Jun 1, 2026

Fix missing strides validation in dpnp.tensor.usm_ndarray #2927

Draft

7 tasks

ndgrigorian reviewed Jun 2, 2026

View reviewed changes

Comment thread dpnp/fft/dpnp_utils_fft.py

vlad-perevezentsev added 2 commits June 4, 2026 04:24

Update comment to clarify negative strides handling

7d1d7a2

Merge remote-tracking branch 'origin/master' into fix_fft_logic

12d3739

vlad-perevezentsev requested a review from ndgrigorian June 4, 2026 11:26

antonwolfy reviewed Jun 12, 2026

View reviewed changes

vlad-perevezentsev added 3 commits June 17, 2026 04:07

Merge remote-tracking branch 'origin/master' into fix_fft_logic

1fbfbd6

Apply remarks

a9419ae

Merge remote-tracking branch 'origin/master' into fix_fft_logic

d0883bd

vlad-perevezentsev requested a review from antonwolfy June 18, 2026 13:48

antonwolfy reviewed Jun 18, 2026

View reviewed changes

Comment thread dpnp/tests/test_fft.py

antonwolfy approved these changes Jun 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize FFT logic for complex strided input to avoid oversized memory allocation#2939

Optimize FFT logic for complex strided input to avoid oversized memory allocation#2939
vlad-perevezentsev wants to merge 8 commits into
masterfrom
fix_fft_logic

vlad-perevezentsev commented Jun 1, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

coveralls commented Jun 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

github-actions Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

antonwolfy Jun 12, 2026

Uh oh!

vlad-perevezentsev Jun 18, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

antonwolfy left a comment

Uh oh!

antonwolfy Jun 19, 2026

Uh oh!

antonwolfy Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

vlad-perevezentsev commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

coveralls commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

antonwolfy Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

vlad-perevezentsev Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

antonwolfy left a comment

Choose a reason for hiding this comment

Uh oh!

antonwolfy Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

antonwolfy Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vlad-perevezentsev commented Jun 1, 2026 •

edited

Loading

coveralls commented Jun 1, 2026 •

edited

Loading

github-actions Bot commented Jun 4, 2026 •

edited

Loading