Skip to content

GH-49677 [Python][C++][Compute] Add search sorted compute kernel#49679

Open
Alex-PLACET wants to merge 29 commits into
apache:mainfrom
Alex-PLACET:add_search_sorted_compute_kernel
Open

GH-49677 [Python][C++][Compute] Add search sorted compute kernel#49679
Alex-PLACET wants to merge 29 commits into
apache:mainfrom
Alex-PLACET:add_search_sorted_compute_kernel

Conversation

@Alex-PLACET
Copy link
Copy Markdown

@Alex-PLACET Alex-PLACET commented Apr 7, 2026

Rationale for this change

Add the implemenation of the search sorted compute kernel based on the numpy function: https://numpy.org/doc/stable/reference/generated/numpy.searchsorted.html

What changes are included in this PR?

Implementation of the C++ kernel + Python API.
Tests in C++ and Python

Are these changes tested?

Yes

Are there any user-facing changes?

No breaking change

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 7, 2026

⚠️ GitHub issue #49677 has been automatically assigned in GitHub to PR creator.

- Added a new benchmark file `vector_search_sorted_benchmark.cc` to evaluate the performance of the SearchSorted function for various data types including Int64, String, and Binary.
- Created a comprehensive test suite in `vector_search_sorted_test.cc` to validate the correctness of SearchSorted across different scenarios, including handling of null values, scalar needles, and run-end encoded arrays.
- Ensured that the benchmarks cover both left and right search options, as well as edge cases like empty arrays and arrays with leading/trailing nulls.
…lize ranges for leading/trailing null counts
@Alex-PLACET Alex-PLACET force-pushed the add_search_sorted_compute_kernel branch from 8e09ea3 to 4ec630e Compare April 8, 2026 08:15
Comment thread python/pyarrow/_compute_docstrings.py
@github-actions github-actions Bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Apr 8, 2026
Copy link
Copy Markdown
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at the implementation yet, but have reviewed the tests and benchmarks (which are quite comprehensive, thank you!).

One missing item is support for chunked arrays. Besides that, see comments below :)

Comment thread cpp/src/arrow/compute/kernels/vector_search_sorted_test.cc Outdated
SearchSortedOptions(SearchSortedOptions::Left)));
ASSERT_OK_AND_ASSIGN(auto right,
SearchSorted(Datum(values), Datum(needles),
SearchSortedOptions(SearchSortedOptions::Right)));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call ValidateFull on both results here too?

(also I'm curious, why not reuse CheckSimpleSearchSorted?)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok fixed too

Comment on lines +90 to +92
std::string scalar_needle_json;
uint64_t expected_scalar_left;
uint64_t expected_scalar_right;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that you could also generate the scalar needle tests automatically by calling GetScalar on the array needles and the expected results. This would make this easier to maintain later.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok fixed

Comment thread cpp/src/arrow/compute/kernels/vector_search_sorted_test.cc
Comment thread cpp/src/arrow/compute/kernels/vector_search_sorted_benchmark.cc Outdated
Comment thread cpp/src/arrow/compute/kernels/vector_search_sorted_benchmark.cc Outdated
Comment thread cpp/src/arrow/compute/kernels/vector_search_sorted_benchmark.cc
Comment thread cpp/src/arrow/compute/kernels/vector_search_sorted_benchmark.cc Outdated
@pitrou
Copy link
Copy Markdown
Member

pitrou commented Apr 23, 2026

@Alex-PLACET Is this ready for review again?

@Alex-PLACET Alex-PLACET requested a review from pitrou April 23, 2026 12:02
@pitrou
Copy link
Copy Markdown
Member

pitrou commented Apr 23, 2026

@Alex-PLACET It looks like the benchmarks don't compile anymore?

/home/antoine/arrow/dev/cpp/src/arrow/compute/kernels/vector_search_sorted_benchmark.cc: In function 'void arrow::compute::BM_SearchSortedBinaryScalarNeedle(benchmark::State&, SearchSortedOptions::Side)':
/home/antoine/arrow/dev/cpp/src/arrow/compute/kernels/vector_search_sorted_benchmark.cc:255:29: error: 'BuildSortedBinaryValues' was not declared in this scope; did you mean 'BuildSortedInt64Values'?
  255 |   const auto values_array = BuildSortedBinaryValues(state.range(0));
      |                             ^~~~~~~~~~~~~~~~~~~~~~~
      |                             BuildSortedInt64Values
In file included from /home/antoine/arrow/dev/cpp/src/arrow/compute/kernels/vector_search_sorted_benchmark.cc:18:
/home/antoine/arrow/dev/cpp/src/arrow/compute/kernels/vector_search_sorted_benchmark.cc: In lambda function:
/home/antoine/arrow/dev/cpp/src/arrow/compute/kernels/vector_search_sorted_benchmark.cc:290:19: error: 'BM_SearchSortedInt64ScalarNeedle' was not declared in this scope; did you mean 'BM_SearchSortedBinaryScalarNeedle'?
  290 | BENCHMARK_CAPTURE(BM_SearchSortedInt64ScalarNeedle, left, SearchSortedOptions::Left)
      |                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/antoine/arrow/dev/cpp/src/arrow/compute/kernels/vector_search_sorted_benchmark.cc: In lambda function:
/home/antoine/arrow/dev/cpp/src/arrow/compute/kernels/vector_search_sorted_benchmark.cc:292:19: error: 'BM_SearchSortedInt64ScalarNeedle' was not declared in this scope; did you mean 'BM_SearchSortedBinaryScalarNeedle'?
  292 | BENCHMARK_CAPTURE(BM_SearchSortedInt64ScalarNeedle, right, SearchSortedOptions::Right)
      |                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/antoine/arrow/dev/cpp/src/arrow/compute/kernels/vector_search_sorted_benchmark.cc: At global scope:
/home/antoine/arrow/dev/cpp/src/arrow/compute/kernels/vector_search_sorted_benchmark.cc:253:13: warning: 'void arrow::compute::BM_SearchSortedBinaryScalarNeedle(benchmark::State&, SearchSortedOptions::Side)' defined but not used [-Wunused-function]
  253 | static void BM_SearchSortedBinaryScalarNeedle(benchmark::State& state,
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/antoine/arrow/dev/cpp/src/arrow/compute/kernels/vector_search_sorted_benchmark.cc:243:13: warning: 'void arrow::compute::BM_SearchSortedStringScalarNeedle(benchmark::State&, SearchSortedOptions::Side)' defined but not used [-Wunused-function]
  243 | static void BM_SearchSortedStringScalarNeedle(benchmark::State& state,
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Copy link
Copy Markdown
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I also took a look at the implementation and some aspects of it seem overly-complicated (but I may miss something). See comments below.

Comment on lines +113 to +114
"[1, 3, 3, 5]",
"[0, 3, 4, 6]",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make the values and needles arrays different lengths?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok done

auto values = ArrayFromJSON(int16(), "[]");
auto needles = ArrayFromJSON(int16(), "[1, 2, 3]");

ASSERT_OK_AND_ASSIGN(auto result, SearchSorted(Datum(values), Datum(needles)));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not reuse CheckSimpleSearchSorted? It would validate the results and also stress scalar needles.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, now it's the case

auto values = ArrayFromJSON(int32(), "[null, 200, 300, 300]");
auto needles = ArrayFromJSON(int32(), "[50, 200, 250, 400]");

ASSERT_OK_AND_ASSIGN(auto left,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, let's re-use the testing helper CheckSimpleSearchSorted everywhere possible, otherwise we're forgetting some checks.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes done

Comment on lines +174 to +177
{"Float32", float32(), "[1.0, 3.0, 3.0, 5.0]", "[0.0, 3.0, 4.0, 6.0]",
"[0, 1, 3, 4]", "[0, 3, 3, 4]"},
{"Float64", float64(), "[1.0, 3.0, 3.0, 5.0]", "[0.0, 3.0, 4.0, 6.0]",
"[0, 1, 3, 4]", "[0, 3, 3, 4]"},
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to test NaNs here? They will probably need special treatment, whether in the values or the needles.

For the record, sorting an array with Arrow always put NaNs between nulls and non-null values.
e.g. either [1.0, 3.0, 3.0, 5.0, NaN, NaN, null] or [null, NaN, NaN, 1.0, 3.0, 3.0, 5.0].

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely missed this case, I added some tests for that:
FloatValuesWithTrailingNaNsAndNulls and FloatValuesWithTrailingNaNsAndNulls

Comment on lines +334 to +339
ASSERT_OK_AND_ASSIGN(auto left,
SearchSorted(Datum(values), Datum(needles),
SearchSortedOptions(SearchSortedOptions::Left)));
ASSERT_OK_AND_ASSIGN(auto right,
SearchSorted(Datum(values), Datum(needles),
SearchSortedOptions(SearchSortedOptions::Right)));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either you write or reuse a helper here, or you need to call ValidateFull directly.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And ideally we should also check scalar needles too.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok done

auto values_type = run_end_encoded(int32(), int32());
ASSERT_OK_AND_ASSIGN(auto ree_values,
REEFromJSON(values_type, "[0, 0, 1, 1, 1, 4, 4, 9]"));
auto sliced = ree_values->Slice(2, 5);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a slice that does not happily falls on run boundaries? (for example Slice(1, 5))

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok done

AssertArraysEqual(*ArrayFromJSON(uint64(), "[0, 0, 3, 3, 5]"), *result.make_array());
}

TEST(SearchSorted, BinaryValues) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already part of the smoke tests, right?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a left over your are right, it's fixed now

Comment on lines +232 to +235
ValueType Value(int64_t index) const {
const auto physical_index = span_.PhysicalIndex(index);
return GetViewType<ArrowType>::LogicalValue(values_.GetView(physical_index));
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so we're processing each run-end logical value individually, and then finding it in the array of physical values? This is wasteful, as it's O(n_needles * log n_values), while each physical value will produce the same output after all.

What run-end-encoded values should do instead is:

  1. run search_sorted on the run values, giving "physical results"
  2. turn the "physical results" into "logical results" by looking up the run start for each physical result

In other words, given:

  • values = run_end_encoded([10,10,10,30,30]) (i.e. run values [10, 30] and run ends [3, 5])
  • needles = [10, 20, 30, 40]

we would first compute the physical result [0, 1, 1, 2] of searching the needles in [10, 30], then turn it into the logical results [0, 3, 3, 5].

This would be O(n_needles * log n_value_runs) and therefore more efficient. It should also generate less template specializations, because you don't need to template this algorithm per (physical value type, run end type) pair.

const auto& needle_data = needles.array();
if (needle_data->type->id() == Type::RUN_END_ENCODED) {
RunEndEncodedArray ree(needle_data);
return DispatchRunEndEncodedByRunEndType<Status>(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly to what I pointed above about run-end-encoded values, you don't need complex indexing for run-end-encoded needles. It's actually even simpler as you can just search each physical needle and run-end-decode the (needle run ends, physical results).

Comment on lines +462 to +468
const int64_t leading_null_count = CountLeadingNulls(
values.length, [&](int64_t index) { return values.IsNull(index); });
const int64_t trailing_null_count =
leading_null_count > 0 ? 0 : CountTrailingNulls(values.length, [&](int64_t index) {
return values.IsNull(index);
});

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is... complicated. We know there are null_count nulls, and they are either clustered at the start or at the end. We just have to look up the first validity bit (values.IsNull(0)) and then we know how many leading/trailing nulls there are.

And we can do the exact same thing for chunked arrays.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I simplified and refactored it

Alex-PLACET and others added 10 commits April 24, 2026 09:38
Co-authored-by: Copilot <copilot@github.com>
…ckSearchSorted function

Co-authored-by: Copilot <copilot@github.com>
…ns for clarity and reusability

Co-authored-by: Copilot <copilot@github.com>
…tency, remove redundant tests

Co-authored-by: Copilot <copilot@github.com>
…ases. Remove RunEndCType template from RunEndEncodedValuesAccessor
…sts for edge cases

Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
@Alex-PLACET Alex-PLACET requested a review from pitrou April 27, 2026 13:07
Copy link
Copy Markdown
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update! This is getting better, though I think there are still opportunities for making the implementation simpler.

ArrayVector{empty_chunk, null_chunk, empty_chunk, last_null_chunk});
auto needles = ArrayFromJSON(int32(), "[1, 4, null]");

CheckSearchSorted(Datum(values), Datum(needles), "[3, 3, null]", "[3, 3, null]");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one raises an interesting question: what is the intended null placement of the sorted values? At end or at start? If nulls are at end, we should return 0 (before the nulls), if nulls are at start, we should return 3 (after the nulls).

The default null placement for sorting in Arrow C++ is NullPlacement::AtEnd so I think this should return 0, not 3.

(we can make it configurable later if there's some demand)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok done

Comment thread cpp/src/arrow/compute/kernels/vector_search_sorted_benchmark.cc
const auto values_length = values.length();
const auto needles_length = needles.length();
state.counters["values_length"] = static_cast<double>(values_length);
state.counters["needles_length"] = static_cast<double>(needles_length);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's perhaps also add the null percentage of the values:

Suggested change
state.counters["needles_length"] = static_cast<double>(needles_length);
state.counters["values_null_percentage"] = values.null_count() * 100.0 / values_length;
state.counters["needles_length"] = static_cast<double>(needles_length);

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread cpp/src/arrow/compute/kernels/vector_search_sorted_benchmark.cc Outdated
VISIT(UInt16Type) \
VISIT(UInt32Type) \
VISIT(UInt64Type) \
VISIT(FloatType) \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably add HalfFloatType?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, but can you also add a test for it?

Comment on lines +1019 to +1021
RunEndEncodedValuesAccessor<ArrowType> non_null_values(values_accessor.array(),
non_null_values_range.offset,
non_null_values_range.length);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that you are assuming that nulls in a REE array can be anywhere? But actually the nulls are stored in the REE values (a REE array does not have a top-level null bitmap).

So you don't need all the complication that seems to be related to computing physical offsets depending on the logical null range of the REE array. Just use the physical null range of the REE values.

You should therefore be able to simplify a lot of stuff away, and perhaps even all the REE Accessor classes. You might have to keep the Accessor / ChunkedAccessor distinction, however.

/// This preserves the cheapest materialization strategy: write repeated
/// insertion indices directly into the preallocated result buffer, while still
/// sharing the same EmitInsertionIndices traversal used by the nullable path.
class PreallocatedInsertionIndexOutput {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you like to have this optimized insertion strategy, you might want to use a TypedBufferBuilder<uint64_t> for both insertion strategies and a TypedBufferBuilder<bool> for the null bitmap (perhaps with a bool template parameter to say whether nulls should be emitted or not).

Something like (rough sketch):

template <bool kEmitNulls>
class InsertionIndexBuilder {
 public:
  explicit InsertionIndexBuilder(MemoryPool* pool)
      : indices_builder_(pool), bitmap_builder_pool_(pool) {}

  Status Init(int64_t length) {
    RETURN_NOT_OK(indices_builder_.Reserve(length));
    if constexpr (kEmitNulls) {
      RETURN_NOT_OK(bitmap_builder_.Reserve(length));
    }
    return Status::OK();
  }

  void AppendIndices(uint64_t value, int64_t run_length) {
    indices_builder_.UnsafeAppend(run_length, value);
    if constexpr (kEmitNulls) {
      bitmap_builder_.UnsafeAppend(run_length, /*value=*/true);
    }
  }

  void AppendNulls(int64_t run_length) {
    if constexpr (kEmitNulls) {
      bitmap_builder_.UnsafeAppend(run_length, /*value=*/false);
    }
  }

 private:
  TypedBufferBuilder<uint64_t> indices_builder_;
  TypedBufferBuilder<uint64_t> bitmap_builder_;
};

}

TEST(SearchSorted, FloatValuesWithLeadingNullsAndTrailingNaNs) {
CheckSimpleSearchSortedAndScalar(float64(), "[null, 1.0, 3.0, 3.0, 5.0, NaN, NaN]",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a valid sorted array (nulls and NaNs should be at the same end, so we can remove this test).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

"[0, 0, 1, 3]", "[0, 1, 1, 3]");
}

TEST(SearchSorted, FloatValuesWithTrailingNaNsAndNulls) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also have a leading NaNs and nulls test?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

Comment on lines +314 to +315
"[0.0, 3.0, 4.0, NaN]", "[0, 1, 3, 4]",
"[0, 3, 3, 6]");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a null in the needles?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Alex-PLACET and others added 3 commits May 4, 2026 14:00
…or improved null bitmap management and streamline test cases for consistency in expected results.

Co-authored-by: Copilot <copilot@github.com>
…d encoded values, ensuring null runs outside the slice are ignored.

Co-authored-by: Copilot <copilot@github.com>
@Alex-PLACET Alex-PLACET requested a review from pitrou May 19, 2026 07:54
@pitrou
Copy link
Copy Markdown
Member

pitrou commented Jun 1, 2026

I took a look at the compiled code size here and we're really generating too much code:

$ ls -lh /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/*.o
-rw-rw-r-- 1 antoine antoine 7,8M juin   1 14:53 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/aggregate_basic_avx2.cc.o
-rw-rw-r-- 1 antoine antoine 7,8M juin   1 14:53 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/aggregate_basic_avx512.cc.o
-rw-rw-r-- 1 antoine antoine  18M juin   1 14:53 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/aggregate_basic.cc.o
-rw-rw-r-- 1 antoine antoine 6,3M juin   1 14:52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/aggregate_mode.cc.o
-rw-rw-r-- 1 antoine antoine 1,7M juin   1 14:51 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/aggregate_pivot.cc.o
-rw-rw-r-- 1 antoine antoine 5,4M juin   1 14:52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/aggregate_quantile.cc.o
-rw-rw-r-- 1 antoine antoine 2,4M juin   1 14:51 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/aggregate_tdigest.cc.o
-rw-rw-r-- 1 antoine antoine 2,5M juin   1 14:52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/aggregate_var_std.cc.o
-rw-rw-r-- 1 antoine antoine  17M juin   1 14:52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/hash_aggregate.cc.o
-rw-rw-r-- 1 antoine antoine  16M juin   1 14:52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/hash_aggregate_numeric.cc.o
-rw-rw-r-- 1 antoine antoine 2,7M juin   1 14:51 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/hash_aggregate_pivot.cc.o
-rw-rw-r-- 1 antoine antoine 1,7M juin   1 14:51 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/pivot_internal.cc.o
-rw-rw-r-- 1 antoine antoine 749K juin   1 14:51 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/ree_util_internal.cc.o
-rw-rw-r-- 1 antoine antoine  18M juin   1 14:53 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_arithmetic.cc.o
-rw-rw-r-- 1 antoine antoine 1,9M juin   1 14:52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_boolean.cc.o
-rw-rw-r-- 1 antoine antoine  11M juin   1 14:52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_compare.cc.o
-rw-rw-r-- 1 antoine antoine 6,3M juin   1 14:52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_if_else.cc.o
-rw-rw-r-- 1 antoine antoine 7,0M juin   1 14:52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_nested.cc.o
-rw-rw-r-- 1 antoine antoine 1,1M juin   1 14:51 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_random.cc.o
-rw-rw-r-- 1 antoine antoine  22M juin   1 14:53 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_round.cc.o
-rw-rw-r-- 1 antoine antoine 4,6M juin   1 14:52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_set_lookup.cc.o
-rw-rw-r-- 1 antoine antoine  12M juin   1 14:52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_string_ascii.cc.o
-rw-rw-r-- 1 antoine antoine 4,8M juin   1 14:52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_string_utf8.cc.o
-rw-rw-r-- 1 antoine antoine  19M juin   1 14:53 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_temporal_binary.cc.o
-rw-rw-r-- 1 antoine antoine  19M juin   1 14:53 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_temporal_unary.cc.o
-rw-rw-r-- 1 antoine antoine 2,1M juin   1 14:52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_validity.cc.o
-rw-rw-r-- 1 antoine antoine 627K juin   1 14:52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/util_internal.cc.o
-rw-rw-r-- 1 antoine antoine 7,8M juin   1 14:52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/vector_array_sort.cc.o
-rw-rw-r-- 1 antoine antoine 8,3M juin   1 14:52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/vector_cumulative_ops.cc.o
-rw-rw-r-- 1 antoine antoine 2,3M juin   1 14:52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/vector_nested.cc.o
-rw-rw-r-- 1 antoine antoine 1,7M juin   1 14:52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/vector_pairwise.cc.o
-rw-rw-r-- 1 antoine antoine 2,7M juin   1 14:52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/vector_rank.cc.o
-rw-rw-r-- 1 antoine antoine 5,7M juin   1 14:52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/vector_replace.cc.o
-rw-rw-r-- 1 antoine antoine 6,4M juin   1 14:52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/vector_run_end_encode.cc.o
-rw-rw-r-- 1 antoine antoine  48M juin   1 14:54 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/vector_search_sorted.cc.o
-rw-rw-r-- 1 antoine antoine  16M juin   1 14:53 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/vector_select_k.cc.o
-rw-rw-r-- 1 antoine antoine  13M juin   1 14:53 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/vector_sort.cc.o
-rw-rw-r-- 1 antoine antoine 3,5M juin   1 14:52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/vector_statistics.cc.o

48M for the search_sorted kernels alone is almost 20% of the total size of compute kernels (256M for libarrow_compute):

$ ls -lh /build/build-release/relwithdebinfo/libarrow*.so.2400.*
-rwxrwxr-x 1 antoine antoine 256M juin   1 14:54 /build/build-release/relwithdebinfo/libarrow_compute.so.2400.0.0
-rwxrwxr-x 1 antoine antoine 268M juin   1 14:53 /build/build-release/relwithdebinfo/libarrow.so.2400.0.0
-rwxrwxr-x 1 antoine antoine  24M juin   1 14:53 /build/build-release/relwithdebinfo/libarrow_testing.so.2400.0.0

This is also confirmed by code sizes (the text column below), ignoring debug information:

$ size /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/*.o
   text    data     bss     dec     hex filename
 335103   13600       0  348703   5521f /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/aggregate_basic_avx2.cc.o
 338903   13600       0  352503   560f7 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/aggregate_basic_avx512.cc.o
 762469   23136    1888  787493   c0425 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/aggregate_basic.cc.o
 227705     752     192  228649   37d29 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/aggregate_mode.cc.o
  43190     976     224   44390    ad66 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/aggregate_pivot.cc.o
 211898     752     224  212874   33f8a /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/aggregate_quantile.cc.o
  71202    1712     384   73298   11e52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/aggregate_tdigest.cc.o
 131967    2000     704  134671   20e0f /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/aggregate_var_std.cc.o
 691424   10464    1920  703808   abd40 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/hash_aggregate.cc.o
 727769   20160    1376  749305   b6ef9 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/hash_aggregate_numeric.cc.o
  71421    1280     224   72925   11cdd /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/hash_aggregate_pivot.cc.o
  39497     808       0   40305    9d71 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/pivot_internal.cc.o
  20989     208       0   21197    52cd /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/ree_util_internal.cc.o
1135819    1128    7296 1144243  1175b3 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_arithmetic.cc.o
  72828     776    1024   74628   12384 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_boolean.cc.o
 580496    7008    1120  588624   8fb50 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_compare.cc.o
 346737    2648     544  349929   556e9 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_if_else.cc.o
 308651    1664     928  311243   4bfcb /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_nested.cc.o
  18833     688     256   19777    4d41 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_random.cc.o
1428882    2816     928 1432626  15dc32 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_round.cc.o
 186180    3240     512  189932   2e5ec /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_set_lookup.cc.o
 608678    5336    8352  622366   97f1e /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_string_ascii.cc.o
 259069    2808    4168  266045   40f3d /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_string_utf8.cc.o
 962615     744    1736  965095   eb9e7 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_temporal_binary.cc.o
 941534    2640    4072  948246   e7816 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_temporal_unary.cc.o
  70576     696     832   72104   119a8 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_validity.cc.o
   5279      72       0    5351    14e7 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/util_internal.cc.o
 396050    2288     296  398634   6152a /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/vector_array_sort.cc.o
 465965    3592    1376  470933   72f95 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/vector_cumulative_ops.cc.o
  69255    1704     320   71279   1166f /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/vector_nested.cc.o
  39943     800     320   41063    a067 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/vector_pairwise.cc.o
  93968    2880     608   97456   17cb0 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/vector_rank.cc.o
 257344    1576     384  259304   3f4e8 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/vector_replace.cc.o
 288222     704     256  289182   4699e /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/vector_run_end_encode.cc.o
2058224    2424     168 2060816  1f7210 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/vector_search_sorted.cc.o
 709493    8712     208  718413   af64d /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/vector_select_k.cc.o
 676708    5800     208  682716   a6adc /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/vector_sort.cc.o
 107434     888     192  108514   1a7e2 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/vector_statistics.cc.o

@pitrou
Copy link
Copy Markdown
Member

pitrou commented Jun 1, 2026

To alleviate this combinatorial explosion, it seems several axes of code generation can be decoupled:

  • the EmitNulls templating (is it useful at all?)
  • physical-to-logical conversion for REE values: this can be done after the physical values search, and needn't be templated on the ArrowType

Also, one important improvement is to use GetPhysicalType to avoid generating separate code paths for e.g. int32 vs. date32, int64 vs. timestamp, binary vs. string, etc.

Copy link
Copy Markdown
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments below. Most important though is the combinatorial explosion I mentioned in the comments.

VISIT(UInt16Type) \
VISIT(UInt32Type) \
VISIT(UInt64Type) \
VISIT(FloatType) \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, but can you also add a test for it?


private:
int64_t LogicalRunEnd(int64_t physical_index) const {
const int64_t logical_run_end = std::max<int64_t>(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The std::max is needed because the logical offset can fall in the middle of a physical run, right? Can you add a comment about that?

GetRunEndValue(::arrow::ree_util::RunEndsArray(array_span_), physical_index) -
array_.offset(),
0);
return std::min(logical_run_end, array_.length());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And for the same reason we need std::min here, right?

SearchSortedOptions::Side side, uint64_t insertion_offset,
Output* output) {
auto emit_search_result = [&](const VisitedNeedle<ArrowType, EmitNulls>& needle,
int64_t run_length) -> Status {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_length is always 1 in all call sites, right?

Comment on lines +908 to +911
template <typename ArrowType, typename ValuesAccessor>
Result<Datum> ComputeRunEndEncodedNeedleInsertionIndices(
const ValuesAccessor& sorted_values, const RunEndEncodedArray& needles,
SearchSortedOptions::Side side, uint64_t insertion_offset, ExecContext* ctx) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs to be templated on ArrowType or ValuesAccessor. Running search_sorted on the needles should be an opaque operation. This function is mostly interested in:

  1. unpacking the physical needles from the REE needles
  2. running search_sorted on the physical needles
  3. re-packing the search_sorted results with the original run-ends

Copilot AI review requested due to automatic review settings June 5, 2026 08:42
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a new search_sorted compute function across C++ and Python, including options, docs, tests, and benchmarks.

Changes:

  • Introduces C++ SearchSorted API + kernel registration and implementation, plus benchmarks.
  • Adds Python bindings for SearchSortedOptions and Python-level tests/docstrings for pc.search_sorted.
  • Updates C++ user docs to list search_sorted and document its constraints/behavior.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
python/pyarrow/tests/test_compute.py Adds Python tests validating pc.search_sorted behavior (left/right, nulls, REE, errors).
python/pyarrow/includes/libarrow.pxd Exposes SearchSortedOptions and side enum to Cython.
python/pyarrow/compute.py Exports SearchSortedOptions in the public Python compute namespace.
python/pyarrow/_compute_docstrings.py Adds Python docstring examples for search_sorted.
python/pyarrow/_compute.pyx Adds SearchSortedOptions Python option class + side parsing.
docs/source/cpp/compute.rst Documents search_sorted in the C++ compute function table and notes constraints.
cpp/src/arrow/meson.build Adds the new kernel source to Meson build.
cpp/src/arrow/compute/registry_internal.h Declares RegisterVectorSearchSorted.
cpp/src/arrow/compute/kernels/vector_search_sorted_test.cc Adds comprehensive C++ kernel tests.
cpp/src/arrow/compute/kernels/vector_search_sorted_benchmark.cc Adds microbenchmarks for SearchSorted.
cpp/src/arrow/compute/kernels/vector_search_sorted.cc Implements the search_sorted meta-function/kernel (arrays, chunked, REE, null handling).
cpp/src/arrow/compute/kernels/meson.build Adds the new benchmark target to Meson.
cpp/src/arrow/compute/kernels/CMakeLists.txt Adds the new C++ test and benchmark targets.
cpp/src/arrow/compute/initialize.cc Registers the kernel at startup.
cpp/src/arrow/compute/api_vector.h Adds SearchSortedOptions and the public SearchSorted API declaration/docs.
cpp/src/arrow/compute/api_vector.cc Registers options type and wires SearchSorted to CallFunction.
cpp/src/arrow/CMakeLists.txt Adds the new kernel source to CMake build.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread python/pyarrow/tests/test_compute.py
Comment thread cpp/src/arrow/compute/api_vector.h Outdated
Comment on lines +535 to +537
/// The `values` datum must be a plain array or run-end encoded array sorted in
/// ascending order. `needles` may be a scalar, plain array, or run-end encoded
/// array whose logical value type matches `values`.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

#include <memory>
#include <numeric>
#include <optional>
#include <ranges>
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apache Arrow targets C++20 now

Comment on lines +532 to +534
const auto it = std::ranges::find_if(
values.chunks(),
[](const std::shared_ptr<Array>& chunk) { return chunk->length() != 0; });
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apache Arrow targets C++20 now

Comment on lines +553 to +556
auto chunk_null_counts = values.chunks() | std::views::transform([](const auto& chunk) {
return GetLogicalNullCount(*chunk->data());
});
return std::reduce(chunk_null_counts.begin(), chunk_null_counts.end(), int64_t{0});
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apache Arrow targets C++20 now

std::static_pointer_cast<Int64Array>(rand.Int64(length, 0, max_value, 0.0));
std::vector<int64_t> data(values->raw_values(),
values->raw_values() + values->length());
std::ranges::sort(data);
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apache Arrow targets C++20 now

for (int64_t index = 0; index < values->length(); ++index) {
data.push_back(values->GetString(index));
}
std::ranges::sort(data);
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apache Arrow targets C++20 now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants