Skip to content

Request: subsetting with numpy arrays/masks #2053

@david-cortes-intel

Description

@david-cortes-intel

ref uxlfoundation/scikit-learn-intelex#2350

Currently, it's not possible to subset dpctl or dpnp arrays with a numpy array of integers:

import numpy as np
import dpnp
X = np.arange(16).reshape((4,4))
dpnp.array(X)[np.arange(2)]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[9], line 4
      2 import dpnp
      3 X = np.arange(16).reshape((4,4))
----> 4 dpnp.array(X)[np.arange(2)]

File /localdisk2/mkl/dcortes/miniforge3/envs/icxconda/lib/python3.12/site-packages/dpnp/dpnp_array.py:355, in dpnp_array.__getitem__(self, key)
    352 """Return ``self[key]``."""
    353 key = _get_unwrapped_index_key(key)
--> 355 item = self._array_obj.__getitem__(key)
    356 return dpnp_array._create_from_usm_ndarray(item)

File dpctl/tensor/_usmarray.pyx:937, in dpctl.tensor._usmarray.usm_ndarray.__getitem__()

File dpctl/tensor/_slicing.pxi:300, in dpctl.tensor._usmarray._basic_slice_meta()

IndexError: Only integers, slices (`:`), ellipsis (`...`), dpctl.tensor.newaxis (`None`) and integer and boolean arrays are valid indices.

Allowing this type of indexing, even if not very efficiently, would be very helpful for scikit-learn-intelex.

This library scikit-learn-intelex is meant to be compatible with the library scikit-learn, by being able to use classes from scikit-learn and scikit-learn-intelex interchangeably, but with scikit-learn-intelex additionally being able to work with dpctl arrays and offering algorithms that run on GPU.

Internally, library scikit-learn (on which scikit-learn-intelex relies) has many functionalities which simply create arrays of indices to subset a larger array, which currently do not work with dpctl/dpnp inputs, as the integer indices are from numpy and the library is not sycl-aware.

Enabling subsetting of dpctl arrays with numpy integer and boolean arrays (as scikit-learn does internally) would immediately enable a lot of useful GPU features on scikit-learn-intelex, such as tuning parameters of machine learning models on GPU, calculating cross-validated metrics, among many others, which currently are not possible.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions