GH-49677 [Python][C++][Compute] Add search sorted compute kernel by Alex-PLACET · Pull Request #49679 · apache/arrow

Alex-PLACET · 2026-04-07T09:41:02Z

Rationale for this change

Add the implemenation of the search sorted compute kernel based on the numpy function: https://numpy.org/doc/stable/reference/generated/numpy.searchsorted.html

What changes are included in this PR?

Implementation of the C++ kernel + Python API.
Tests in C++ and Python

Are these changes tested?

Yes

Are there any user-facing changes?

No breaking change

GitHub Issue: [C++][Python] Implement search_sorted kernel for all primitive types and run-end encoded arrays #49677

github-actions · 2026-04-07T09:41:36Z

⚠️ GitHub issue #49677 has been automatically assigned in GitHub to PR creator.

- Added a new benchmark file `vector_search_sorted_benchmark.cc` to evaluate the performance of the SearchSorted function for various data types including Int64, String, and Binary. - Created a comprehensive test suite in `vector_search_sorted_test.cc` to validate the correctness of SearchSorted across different scenarios, including handling of null values, scalar needles, and run-end encoded arrays. - Ensured that the benchmarks cover both left and right search options, as well as edge cases like empty arrays and arrays with leading/trailing nulls.

…rks for needles with null runs

…tation overview and flow

…lize ranges for leading/trailing null counts

…ensive tests for supported types

…xcept to length method

pitrou

I haven't looked at the implementation yet, but have reviewed the tests and benchmarks (which are quite comprehensive, thank you!).

One missing item is support for chunked arrays. Besides that, see comments below :)

pitrou · 2026-04-15T13:40:10Z

+                                    SearchSortedOptions(SearchSortedOptions::Left)));
+  ASSERT_OK_AND_ASSIGN(auto right,
+                       SearchSorted(Datum(values), Datum(needles),
+                                    SearchSortedOptions(SearchSortedOptions::Right)));


Let's call ValidateFull on both results here too?

(also I'm curious, why not reuse CheckSimpleSearchSorted?)

Ok fixed too

pitrou · 2026-04-15T13:43:12Z

+  std::string scalar_needle_json;
+  uint64_t expected_scalar_left;
+  uint64_t expected_scalar_right;


Note that you could also generate the scalar needle tests automatically by calling GetScalar on the array needles and the expected results. This would make this easier to maintain later.

…and add tests for chunked values and needles

…expected results

pitrou · 2026-04-23T09:09:04Z

@Alex-PLACET Is this ready for review again?

…ctions

pitrou · 2026-04-23T14:01:46Z

@Alex-PLACET It looks like the benchmarks don't compile anymore?

/home/antoine/arrow/dev/cpp/src/arrow/compute/kernels/vector_search_sorted_benchmark.cc: In function 'void arrow::compute::BM_SearchSortedBinaryScalarNeedle(benchmark::State&, SearchSortedOptions::Side)':
/home/antoine/arrow/dev/cpp/src/arrow/compute/kernels/vector_search_sorted_benchmark.cc:255:29: error: 'BuildSortedBinaryValues' was not declared in this scope; did you mean 'BuildSortedInt64Values'?
  255 |   const auto values_array = BuildSortedBinaryValues(state.range(0));
      |                             ^~~~~~~~~~~~~~~~~~~~~~~
      |                             BuildSortedInt64Values
In file included from /home/antoine/arrow/dev/cpp/src/arrow/compute/kernels/vector_search_sorted_benchmark.cc:18:
/home/antoine/arrow/dev/cpp/src/arrow/compute/kernels/vector_search_sorted_benchmark.cc: In lambda function:
/home/antoine/arrow/dev/cpp/src/arrow/compute/kernels/vector_search_sorted_benchmark.cc:290:19: error: 'BM_SearchSortedInt64ScalarNeedle' was not declared in this scope; did you mean 'BM_SearchSortedBinaryScalarNeedle'?
  290 | BENCHMARK_CAPTURE(BM_SearchSortedInt64ScalarNeedle, left, SearchSortedOptions::Left)
      |                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/antoine/arrow/dev/cpp/src/arrow/compute/kernels/vector_search_sorted_benchmark.cc: In lambda function:
/home/antoine/arrow/dev/cpp/src/arrow/compute/kernels/vector_search_sorted_benchmark.cc:292:19: error: 'BM_SearchSortedInt64ScalarNeedle' was not declared in this scope; did you mean 'BM_SearchSortedBinaryScalarNeedle'?
  292 | BENCHMARK_CAPTURE(BM_SearchSortedInt64ScalarNeedle, right, SearchSortedOptions::Right)
      |                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/antoine/arrow/dev/cpp/src/arrow/compute/kernels/vector_search_sorted_benchmark.cc: At global scope:
/home/antoine/arrow/dev/cpp/src/arrow/compute/kernels/vector_search_sorted_benchmark.cc:253:13: warning: 'void arrow::compute::BM_SearchSortedBinaryScalarNeedle(benchmark::State&, SearchSortedOptions::Side)' defined but not used [-Wunused-function]
  253 | static void BM_SearchSortedBinaryScalarNeedle(benchmark::State& state,
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/antoine/arrow/dev/cpp/src/arrow/compute/kernels/vector_search_sorted_benchmark.cc:243:13: warning: 'void arrow::compute::BM_SearchSortedStringScalarNeedle(benchmark::State&, SearchSortedOptions::Side)' defined but not used [-Wunused-function]
  243 | static void BM_SearchSortedStringScalarNeedle(benchmark::State& state,
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

pitrou

Now I also took a look at the implementation and some aspects of it seem overly-complicated (but I may miss something). See comments below.

pitrou · 2026-04-23T14:06:33Z

+          "[1, 3, 3, 5]",
+          "[0, 3, 4, 6]",


Can we make the values and needles arrays different lengths?

pitrou · 2026-04-23T14:08:09Z

+  auto values = ArrayFromJSON(int16(), "[]");
+  auto needles = ArrayFromJSON(int16(), "[1, 2, 3]");
+
+  ASSERT_OK_AND_ASSIGN(auto result, SearchSorted(Datum(values), Datum(needles)));


Why not reuse CheckSimpleSearchSorted? It would validate the results and also stress scalar needles.

You are right, now it's the case

pitrou · 2026-04-23T14:09:30Z

+  auto values = ArrayFromJSON(int32(), "[null, 200, 300, 300]");
+  auto needles = ArrayFromJSON(int32(), "[50, 200, 250, 400]");
+
+  ASSERT_OK_AND_ASSIGN(auto left,


Again, let's re-use the testing helper CheckSimpleSearchSorted everywhere possible, otherwise we're forgetting some checks.

pitrou · 2026-04-23T14:11:46Z

+      {"Float32", float32(), "[1.0, 3.0, 3.0, 5.0]", "[0.0, 3.0, 4.0, 6.0]",
+       "[0, 1, 3, 4]", "[0, 3, 3, 4]"},
+      {"Float64", float64(), "[1.0, 3.0, 3.0, 5.0]", "[0.0, 3.0, 4.0, 6.0]",
+       "[0, 1, 3, 4]", "[0, 3, 3, 4]"},


Do we want to test NaNs here? They will probably need special treatment, whether in the values or the needles.

For the record, sorting an array with Arrow always put NaNs between nulls and non-null values.
e.g. either [1.0, 3.0, 3.0, 5.0, NaN, NaN, null] or [null, NaN, NaN, 1.0, 3.0, 3.0, 5.0].

I completely missed this case, I added some tests for that:
FloatValuesWithTrailingNaNsAndNulls and FloatValuesWithTrailingNaNsAndNulls

pitrou · 2026-04-23T14:14:51Z

+  ASSERT_OK_AND_ASSIGN(auto left,
+                       SearchSorted(Datum(values), Datum(needles),
+                                    SearchSortedOptions(SearchSortedOptions::Left)));
+  ASSERT_OK_AND_ASSIGN(auto right,
+                       SearchSorted(Datum(values), Datum(needles),
+                                    SearchSortedOptions(SearchSortedOptions::Right)));


Either you write or reuse a helper here, or you need to call ValidateFull directly.

And ideally we should also check scalar needles too.

pitrou · 2026-04-23T14:22:50Z

+  auto values_type = run_end_encoded(int32(), int32());
+  ASSERT_OK_AND_ASSIGN(auto ree_values,
+                       REEFromJSON(values_type, "[0, 0, 1, 1, 1, 4, 4, 9]"));
+  auto sliced = ree_values->Slice(2, 5);


Can we use a slice that does not happily falls on run boundaries? (for example Slice(1, 5))

pitrou · 2026-04-23T14:23:35Z

+  AssertArraysEqual(*ArrayFromJSON(uint64(), "[0, 0, 3, 3, 5]"), *result.make_array());
+}
+
+TEST(SearchSorted, BinaryValues) {


This is already part of the smoke tests, right?

It was a left over your are right, it's fixed now

pitrou · 2026-04-23T14:40:52Z

+  ValueType Value(int64_t index) const {
+    const auto physical_index = span_.PhysicalIndex(index);
+    return GetViewType<ArrowType>::LogicalValue(values_.GetView(physical_index));
+  }


Hmm, so we're processing each run-end logical value individually, and then finding it in the array of physical values? This is wasteful, as it's O(n_needles * log n_values), while each physical value will produce the same output after all.

What run-end-encoded values should do instead is:

run search_sorted on the run values, giving "physical results"

turn the "physical results" into "logical results" by looking up the run start for each physical result

In other words, given:

values = run_end_encoded([10,10,10,30,30]) (i.e. run values [10, 30] and run ends [3, 5])

needles = [10, 20, 30, 40]

we would first compute the physical result [0, 1, 1, 2] of searching the needles in [10, 30], then turn it into the logical results [0, 3, 3, 5].

This would be O(n_needles * log n_value_runs) and therefore more efficient. It should also generate less template specializations, because you don't need to template this algorithm per (physical value type, run end type) pair.

pitrou · 2026-04-23T14:43:16Z

+  const auto& needle_data = needles.array();
+  if (needle_data->type->id() == Type::RUN_END_ENCODED) {
+    RunEndEncodedArray ree(needle_data);
+    return DispatchRunEndEncodedByRunEndType<Status>(


Similarly to what I pointed above about run-end-encoded values, you don't need complex indexing for run-end-encoded needles. It's actually even simpler as you can just search each physical needle and run-end-decode the (needle run ends, physical results).

pitrou · 2026-04-23T14:55:07Z

+  const int64_t leading_null_count = CountLeadingNulls(
+      values.length, [&](int64_t index) { return values.IsNull(index); });
+  const int64_t trailing_null_count =
+      leading_null_count > 0 ? 0 : CountTrailingNulls(values.length, [&](int64_t index) {
+        return values.IsNull(index);
+      });
+


This is... complicated. We know there are null_count nulls, and they are either clustered at the start or at the end. We just have to look up the first validity bit (values.IsNull(0)) and then we know how many leading/trailing nulls there are.

And we can do the exact same thing for chunked arrays.

You are right, I simplified and refactored it

Co-authored-by: Copilot <copilot@github.com>

…ckSearchSorted function Co-authored-by: Copilot <copilot@github.com>

…ns for clarity and reusability Co-authored-by: Copilot <copilot@github.com>

…tency, remove redundant tests Co-authored-by: Copilot <copilot@github.com>

…ases. Remove RunEndCType template from RunEndEncodedValuesAccessor

…Values with ChunkedNeedles

Co-authored-by: Copilot <copilot@github.com>

…sts for edge cases Co-authored-by: Copilot <copilot@github.com>

Co-authored-by: Copilot <copilot@github.com>

pitrou

Thanks for the update! This is getting better, though I think there are still opportunities for making the implementation simpler.

pitrou · 2026-04-30T13:47:02Z

+      ArrayVector{empty_chunk, null_chunk, empty_chunk, last_null_chunk});
+  auto needles = ArrayFromJSON(int32(), "[1, 4, null]");
+
+  CheckSearchSorted(Datum(values), Datum(needles), "[3, 3, null]", "[3, 3, null]");


This one raises an interesting question: what is the intended null placement of the sorted values? At end or at start? If nulls are at end, we should return 0 (before the nulls), if nulls are at start, we should return 3 (after the nulls).

The default null placement for sorting in Arrow C++ is NullPlacement::AtEnd so I think this should return 0, not 3.

(we can make it configurable later if there's some demand)

pitrou · 2026-04-30T13:53:11Z

+  const auto values_length = values.length();
+  const auto needles_length = needles.length();
+  state.counters["values_length"] = static_cast<double>(values_length);
+  state.counters["needles_length"] = static_cast<double>(needles_length);


Let's perhaps also add the null percentage of the values:

Suggested change

state.counters["needles_length"] = static_cast<double>(needles_length);

state.counters["values_null_percentage"] = values.null_count() * 100.0 / values_length;

state.counters["needles_length"] = static_cast<double>(needles_length);

pitrou · 2026-04-30T14:02:49Z

+  VISIT(UInt16Type)                      \
+  VISIT(UInt32Type)                      \
+  VISIT(UInt64Type)                      \
+  VISIT(FloatType)                       \


We can probably add HalfFloatType?

Thanks, but can you also add a test for it?

pitrou · 2026-04-30T14:24:15Z

+  RunEndEncodedValuesAccessor<ArrowType> non_null_values(values_accessor.array(),
+                                                         non_null_values_range.offset,
+                                                         non_null_values_range.length);


It seems that you are assuming that nulls in a REE array can be anywhere? But actually the nulls are stored in the REE values (a REE array does not have a top-level null bitmap).

So you don't need all the complication that seems to be related to computing physical offsets depending on the logical null range of the REE array. Just use the physical null range of the REE values.

You should therefore be able to simplify a lot of stuff away, and perhaps even all the REE Accessor classes. You might have to keep the Accessor / ChunkedAccessor distinction, however.

pitrou · 2026-04-30T14:37:24Z

+/// This preserves the cheapest materialization strategy: write repeated
+/// insertion indices directly into the preallocated result buffer, while still
+/// sharing the same EmitInsertionIndices traversal used by the nullable path.
+class PreallocatedInsertionIndexOutput {


If you like to have this optimized insertion strategy, you might want to use a TypedBufferBuilder<uint64_t> for both insertion strategies and a TypedBufferBuilder<bool> for the null bitmap (perhaps with a bool template parameter to say whether nulls should be emitted or not).

Something like (rough sketch):

template <bool kEmitNulls> class InsertionIndexBuilder { public: explicit InsertionIndexBuilder(MemoryPool* pool) : indices_builder_(pool), bitmap_builder_pool_(pool) {} Status Init(int64_t length) { RETURN_NOT_OK(indices_builder_.Reserve(length)); if constexpr (kEmitNulls) { RETURN_NOT_OK(bitmap_builder_.Reserve(length)); } return Status::OK(); } void AppendIndices(uint64_t value, int64_t run_length) { indices_builder_.UnsafeAppend(run_length, value); if constexpr (kEmitNulls) { bitmap_builder_.UnsafeAppend(run_length, /*value=*/true); } } void AppendNulls(int64_t run_length) { if constexpr (kEmitNulls) { bitmap_builder_.UnsafeAppend(run_length, /*value=*/false); } } private: TypedBufferBuilder<uint64_t> indices_builder_; TypedBufferBuilder<uint64_t> bitmap_builder_; };

pitrou · 2026-04-30T14:41:42Z

+}
+
+TEST(SearchSorted, FloatValuesWithLeadingNullsAndTrailingNaNs) {
+  CheckSimpleSearchSortedAndScalar(float64(), "[null, 1.0, 3.0, 3.0, 5.0, NaN, NaN]",


This is not a valid sorted array (nulls and NaNs should be at the same end, so we can remove this test).

pitrou · 2026-04-30T14:41:57Z

+                          "[0, 0, 1, 3]", "[0, 1, 1, 3]");
+}
+
+TEST(SearchSorted, FloatValuesWithTrailingNaNsAndNulls) {


Can we also have a leading NaNs and nulls test?

pitrou · 2026-04-30T14:42:21Z

+                                   "[0.0, 3.0, 4.0, NaN]", "[0, 1, 3, 4]",
+                                   "[0, 3, 3, 6]");


Can you also add a null in the needles?

…or improved null bitmap management and streamline test cases for consistency in expected results. Co-authored-by: Copilot <copilot@github.com>

…d encoded values, ensuring null runs outside the slice are ignored. Co-authored-by: Copilot <copilot@github.com>

pitrou · 2026-06-01T14:50:58Z

I took a look at the compiled code size here and we're really generating too much code:

$ ls -lh /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/*.o
-rw-rw-r-- 1 antoine antoine 7,8M juin   1 14:53 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/aggregate_basic_avx2.cc.o
-rw-rw-r-- 1 antoine antoine 7,8M juin   1 14:53 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/aggregate_basic_avx512.cc.o
-rw-rw-r-- 1 antoine antoine  18M juin   1 14:53 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/aggregate_basic.cc.o
-rw-rw-r-- 1 antoine antoine 6,3M juin   1 14:52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/aggregate_mode.cc.o
-rw-rw-r-- 1 antoine antoine 1,7M juin   1 14:51 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/aggregate_pivot.cc.o
-rw-rw-r-- 1 antoine antoine 5,4M juin   1 14:52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/aggregate_quantile.cc.o
-rw-rw-r-- 1 antoine antoine 2,4M juin   1 14:51 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/aggregate_tdigest.cc.o
-rw-rw-r-- 1 antoine antoine 2,5M juin   1 14:52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/aggregate_var_std.cc.o
-rw-rw-r-- 1 antoine antoine  17M juin   1 14:52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/hash_aggregate.cc.o
-rw-rw-r-- 1 antoine antoine  16M juin   1 14:52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/hash_aggregate_numeric.cc.o
-rw-rw-r-- 1 antoine antoine 2,7M juin   1 14:51 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/hash_aggregate_pivot.cc.o
-rw-rw-r-- 1 antoine antoine 1,7M juin   1 14:51 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/pivot_internal.cc.o
-rw-rw-r-- 1 antoine antoine 749K juin   1 14:51 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/ree_util_internal.cc.o
-rw-rw-r-- 1 antoine antoine  18M juin   1 14:53 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_arithmetic.cc.o
-rw-rw-r-- 1 antoine antoine 1,9M juin   1 14:52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_boolean.cc.o
-rw-rw-r-- 1 antoine antoine  11M juin   1 14:52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_compare.cc.o
-rw-rw-r-- 1 antoine antoine 6,3M juin   1 14:52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_if_else.cc.o
-rw-rw-r-- 1 antoine antoine 7,0M juin   1 14:52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_nested.cc.o
-rw-rw-r-- 1 antoine antoine 1,1M juin   1 14:51 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_random.cc.o
-rw-rw-r-- 1 antoine antoine  22M juin   1 14:53 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_round.cc.o
-rw-rw-r-- 1 antoine antoine 4,6M juin   1 14:52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_set_lookup.cc.o
-rw-rw-r-- 1 antoine antoine  12M juin   1 14:52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_string_ascii.cc.o
-rw-rw-r-- 1 antoine antoine 4,8M juin   1 14:52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_string_utf8.cc.o
-rw-rw-r-- 1 antoine antoine  19M juin   1 14:53 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_temporal_binary.cc.o
-rw-rw-r-- 1 antoine antoine  19M juin   1 14:53 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_temporal_unary.cc.o
-rw-rw-r-- 1 antoine antoine 2,1M juin   1 14:52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_validity.cc.o
-rw-rw-r-- 1 antoine antoine 627K juin   1 14:52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/util_internal.cc.o
-rw-rw-r-- 1 antoine antoine 7,8M juin   1 14:52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/vector_array_sort.cc.o
-rw-rw-r-- 1 antoine antoine 8,3M juin   1 14:52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/vector_cumulative_ops.cc.o
-rw-rw-r-- 1 antoine antoine 2,3M juin   1 14:52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/vector_nested.cc.o
-rw-rw-r-- 1 antoine antoine 1,7M juin   1 14:52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/vector_pairwise.cc.o
-rw-rw-r-- 1 antoine antoine 2,7M juin   1 14:52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/vector_rank.cc.o
-rw-rw-r-- 1 antoine antoine 5,7M juin   1 14:52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/vector_replace.cc.o
-rw-rw-r-- 1 antoine antoine 6,4M juin   1 14:52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/vector_run_end_encode.cc.o
-rw-rw-r-- 1 antoine antoine  48M juin   1 14:54 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/vector_search_sorted.cc.o
-rw-rw-r-- 1 antoine antoine  16M juin   1 14:53 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/vector_select_k.cc.o
-rw-rw-r-- 1 antoine antoine  13M juin   1 14:53 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/vector_sort.cc.o
-rw-rw-r-- 1 antoine antoine 3,5M juin   1 14:52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/vector_statistics.cc.o

48M for the search_sorted kernels alone is almost 20% of the total size of compute kernels (256M for libarrow_compute):

$ ls -lh /build/build-release/relwithdebinfo/libarrow*.so.2400.*
-rwxrwxr-x 1 antoine antoine 256M juin   1 14:54 /build/build-release/relwithdebinfo/libarrow_compute.so.2400.0.0
-rwxrwxr-x 1 antoine antoine 268M juin   1 14:53 /build/build-release/relwithdebinfo/libarrow.so.2400.0.0
-rwxrwxr-x 1 antoine antoine  24M juin   1 14:53 /build/build-release/relwithdebinfo/libarrow_testing.so.2400.0.0

This is also confirmed by code sizes (the text column below), ignoring debug information:

$ size /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/*.o
   text    data     bss     dec     hex filename
 335103   13600       0  348703   5521f /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/aggregate_basic_avx2.cc.o
 338903   13600       0  352503   560f7 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/aggregate_basic_avx512.cc.o
 762469   23136    1888  787493   c0425 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/aggregate_basic.cc.o
 227705     752     192  228649   37d29 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/aggregate_mode.cc.o
  43190     976     224   44390    ad66 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/aggregate_pivot.cc.o
 211898     752     224  212874   33f8a /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/aggregate_quantile.cc.o
  71202    1712     384   73298   11e52 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/aggregate_tdigest.cc.o
 131967    2000     704  134671   20e0f /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/aggregate_var_std.cc.o
 691424   10464    1920  703808   abd40 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/hash_aggregate.cc.o
 727769   20160    1376  749305   b6ef9 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/hash_aggregate_numeric.cc.o
  71421    1280     224   72925   11cdd /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/hash_aggregate_pivot.cc.o
  39497     808       0   40305    9d71 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/pivot_internal.cc.o
  20989     208       0   21197    52cd /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/ree_util_internal.cc.o
1135819    1128    7296 1144243  1175b3 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_arithmetic.cc.o
  72828     776    1024   74628   12384 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_boolean.cc.o
 580496    7008    1120  588624   8fb50 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_compare.cc.o
 346737    2648     544  349929   556e9 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_if_else.cc.o
 308651    1664     928  311243   4bfcb /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_nested.cc.o
  18833     688     256   19777    4d41 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_random.cc.o
1428882    2816     928 1432626  15dc32 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_round.cc.o
 186180    3240     512  189932   2e5ec /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_set_lookup.cc.o
 608678    5336    8352  622366   97f1e /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_string_ascii.cc.o
 259069    2808    4168  266045   40f3d /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_string_utf8.cc.o
 962615     744    1736  965095   eb9e7 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_temporal_binary.cc.o
 941534    2640    4072  948246   e7816 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_temporal_unary.cc.o
  70576     696     832   72104   119a8 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/scalar_validity.cc.o
   5279      72       0    5351    14e7 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/util_internal.cc.o
 396050    2288     296  398634   6152a /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/vector_array_sort.cc.o
 465965    3592    1376  470933   72f95 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/vector_cumulative_ops.cc.o
  69255    1704     320   71279   1166f /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/vector_nested.cc.o
  39943     800     320   41063    a067 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/vector_pairwise.cc.o
  93968    2880     608   97456   17cb0 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/vector_rank.cc.o
 257344    1576     384  259304   3f4e8 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/vector_replace.cc.o
 288222     704     256  289182   4699e /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/vector_run_end_encode.cc.o
2058224    2424     168 2060816  1f7210 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/vector_search_sorted.cc.o
 709493    8712     208  718413   af64d /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/vector_select_k.cc.o
 676708    5800     208  682716   a6adc /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/vector_sort.cc.o
 107434     888     192  108514   1a7e2 /build/build-release/src/arrow/CMakeFiles/arrow_compute_objlib.dir/compute/kernels/vector_statistics.cc.o

pitrou · 2026-06-01T15:03:30Z

To alleviate this combinatorial explosion, it seems several axes of code generation can be decoupled:

the EmitNulls templating (is it useful at all?)
physical-to-logical conversion for REE values: this can be done after the physical values search, and needn't be templated on the ArrowType

Also, one important improvement is to use GetPhysicalType to avoid generating separate code paths for e.g. int32 vs. date32, int64 vs. timestamp, binary vs. string, etc.

pitrou

Some more comments below. Most important though is the combinatorial explosion I mentioned in the comments.

pitrou · 2026-06-01T14:23:07Z

+  VISIT(UInt16Type)                      \
+  VISIT(UInt32Type)                      \
+  VISIT(UInt64Type)                      \
+  VISIT(FloatType)                       \


Thanks, but can you also add a test for it?

pitrou · 2026-06-01T14:23:52Z

+
+ private:
+  int64_t LogicalRunEnd(int64_t physical_index) const {
+    const int64_t logical_run_end = std::max<int64_t>(


The std::max is needed because the logical offset can fall in the middle of a physical run, right? Can you add a comment about that?

pitrou · 2026-06-01T14:24:10Z

+        GetRunEndValue(::arrow::ree_util::RunEndsArray(array_span_), physical_index) -
+            array_.offset(),
+        0);
+    return std::min(logical_run_end, array_.length());


And for the same reason we need std::min here, right?

pitrou · 2026-06-01T15:03:52Z

+                            SearchSortedOptions::Side side, uint64_t insertion_offset,
+                            Output* output) {
+  auto emit_search_result = [&](const VisitedNeedle<ArrowType, EmitNulls>& needle,
+                                int64_t run_length) -> Status {


run_length is always 1 in all call sites, right?

pitrou · 2026-06-01T15:07:14Z

+template <typename ArrowType, typename ValuesAccessor>
+Result<Datum> ComputeRunEndEncodedNeedleInsertionIndices(
+    const ValuesAccessor& sorted_values, const RunEndEncodedArray& needles,
+    SearchSortedOptions::Side side, uint64_t insertion_offset, ExecContext* ctx) {


I don't think this needs to be templated on ArrowType or ValuesAccessor. Running search_sorted on the needles should be an opaque operation. This function is mostly interested in:

unpacking the physical needles from the REE needles

running search_sorted on the physical needles

re-packing the search_sorted results with the original run-ends

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a new search_sorted compute function across C++ and Python, including options, docs, tests, and benchmarks.

Changes:

Introduces C++ SearchSorted API + kernel registration and implementation, plus benchmarks.
Adds Python bindings for SearchSortedOptions and Python-level tests/docstrings for pc.search_sorted.
Updates C++ user docs to list search_sorted and document its constraints/behavior.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
python/pyarrow/tests/test_compute.py	Adds Python tests validating `pc.search_sorted` behavior (left/right, nulls, REE, errors).
python/pyarrow/includes/libarrow.pxd	Exposes `SearchSortedOptions` and side enum to Cython.
python/pyarrow/compute.py	Exports `SearchSortedOptions` in the public Python compute namespace.
python/pyarrow/_compute_docstrings.py	Adds Python docstring examples for `search_sorted`.
python/pyarrow/_compute.pyx	Adds `SearchSortedOptions` Python option class + side parsing.
docs/source/cpp/compute.rst	Documents `search_sorted` in the C++ compute function table and notes constraints.
cpp/src/arrow/meson.build	Adds the new kernel source to Meson build.
cpp/src/arrow/compute/registry_internal.h	Declares `RegisterVectorSearchSorted`.
cpp/src/arrow/compute/kernels/vector_search_sorted_test.cc	Adds comprehensive C++ kernel tests.
cpp/src/arrow/compute/kernels/vector_search_sorted_benchmark.cc	Adds microbenchmarks for `SearchSorted`.
cpp/src/arrow/compute/kernels/vector_search_sorted.cc	Implements the `search_sorted` meta-function/kernel (arrays, chunked, REE, null handling).
cpp/src/arrow/compute/kernels/meson.build	Adds the new benchmark target to Meson.
cpp/src/arrow/compute/kernels/CMakeLists.txt	Adds the new C++ test and benchmark targets.
cpp/src/arrow/compute/initialize.cc	Registers the kernel at startup.
cpp/src/arrow/compute/api_vector.h	Adds `SearchSortedOptions` and the public `SearchSorted` API declaration/docs.
cpp/src/arrow/compute/api_vector.cc	Registers options type and wires `SearchSorted` to `CallFunction`.
cpp/src/arrow/CMakeLists.txt	Adds the new kernel source to CMake build.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Alex-PLACET · 2026-06-05T12:47:43Z

+/// The `values` datum must be a plain array or run-end encoded array sorted in
+/// ascending order. `needles` may be a scalar, plain array, or run-end encoded
+/// array whose logical value type matches `values`.


Alex-PLACET · 2026-06-05T12:25:03Z

+#include <memory>
+#include <numeric>
+#include <optional>
+#include <ranges>