Move to nanobind by evertlammerts · Pull Request #522 · duckdb/duckdb-python

evertlammerts · 2026-06-30T17:49:56Z

No description provided.

…ilding) Build-system integration WORKS (CMake configure passes): find_package(Python)+nanobind, nanobind_build_library(nanobind-static) feeding the object libs, nanobind_add_module NB_STATIC; pyproject build dep pybind11->nanobind. Umbrella (pybind_wrapper.hpp) + enum caster macro + identifier caster ported to nanobind from_python/from_cpp API; mechanical renames applied (NB_MODULE, python_error, borrow/steal, def_prop_ro, namespace py = nanobind). First build surfaced 254 errors; keystone fixes bring it to 224, cascade cleared. Remaining work concentrated: numpy nb::ndarray port (~122), arrow_array_stream (59), py:: API diffs in python_objects/relation/result/connection headers (~60), object wrappers in dataframe.hpp (12), optional/pyconnection_default casters, register_exception, py::options, init_implicit, 81 .none().

Cleared categorically: identifier+enum casters, object wrappers (borrow_t ctors, handle_type_name blocks removed), module_::import_, py::module_, namespace py = nanobind. Build system still green (configure passes). Remaining concentrated in: numpy nb::ndarray port (py::dtype has no nanobind equiv -> reroute via numpy.empty + nb::ndarray; touches callers, not just the facade), ~150 scattered py:: API diffs (py::str->string, handle/object nuances) across connection/relation/result/expression, optional/pyconnection_default casters, register_exception->nb::exception, init_implicit, py::options.

numpy DONE: NumpyArray facade ported off py::array/py::dtype (cold-path ctypes.data buffer access, dtype-as-string Allocate via numpy.empty, in-place resize) -- move-faithful, no copies. Converted 15 .cast<>() method calls -> py::cast<>(), py::ssize_t->Py_ssize_t, py::function->py::callable, dropped py::options. numpy_array.hpp + arrow_array_stream.hpp now compile. Remaining: per-site py:: tail (~25 functional-cast string(obj)->py::cast, ~36 missing-member, move/ref bindings) across 12 files + pybind_wrapper.cpp impl + pyconnection_default caster.

…ault caster retirement, bulk str/int/type-of/cast conversions

…sule, None-as-dict, exceptions

…_, PyTokenize/UDF tuple building

…nection args, remaining str/iteration fixes

…, type_object, capsule.data, len, more conversions

…e_object

…ssion), dict/list builds, bytes; numpy buffer-pointer caching (perf)

…tions (crash on import)

…t type-punning) in dataframe/scan/bind/map/udf

…ebind __exit__ via lambda

…oin other_rel

…float cast

…nversion (shared_ptr caster strips convert)

…ter types

…or implicit conversions); guard numpy ctypes eager-compute

…o PyObject_Str runs) across numpy/pandas/udf/replacement paths

…accepting None)

…ls crash cascade) Add a custom type_caster<shared_ptr<DuckDBPyExpression>> (mirrors the DuckDBPyType one): keep cast_flags::convert so the registered implicit conversions (str->column, scalar->constant) fire for shared_ptr args, and when the inner caster yields no instance, construct through the registered Python ctor (None->NULL constant) -- a real owned object, no dangling -- with PyErr_Clear() on failure. Allow None on the Expression object-ctor (py::arg.none()). The PyErr_Clear is what eliminates the stale-PyErr segfault CASCADE: the full fast suite now runs clean in parallel (0 crashes, was unmeasurable). Failures 86 -> 66; expression/spark Expression cluster resolved (spark 6->3). Belt-and-suspenders None guard in CreateCompareExpression/Coalesce.

The NumpyArray facade read the buffer pointer via numpy's `ctypes.data` attribute chain and allocated via `numpy.empty(count, dtype_string)`. For a top-level column that runs once per 2048-row chunk (amortized), but the LIST/ARRAY per-element converter allocates a fresh array per row, so at 200k rows it became ~600k ctypes-object allocations: df()/fetchnumpy() of a LIST column ran ~6x slower than the pybind11 baseline (829ms vs 136ms). Read the buffer pointer directly from numpy's PyArrayObject C struct (a plain field read, as pybind11's array.data() did), gated by a PyObject_TypeCheck against numpy.ndarray so non-ndarray wrappers are never reinterpreted. Cache the numpy.empty callable and per-dtype np.dtype objects, and skip the no-op resize-to-current-length on the per-element path. Output is byte-identical (lists, nested, nulls, empty, masked, large-N); the row and arrow paths and the int/double/struct columnar paths are unaffected. LIST df()/fetchnumpy() now match-or-beat the pybind11 baseline (69ms).

evertlammerts added 30 commits June 26, 2026 20:29

nanobind: fix class_ holders, pybind11:: stragglers, pyconnection_def…

c189ee8

…ault caster retirement, bulk str/int/type-of/cast conversions

nanobind: conversions, iteration ref->value, tuple/list building, cap…

cd76893

…sule, None-as-dict, exceptions

nanobind: module-init macros, args/kwargs annotation rules, init->new…

f4f818d

…_, PyTokenize/UDF tuple building

nanobind: Value(py::str) -> cast, int_ explicit casts, .none() on con…

033ef66

…nection args, remaining str/iteration fixes

nanobind: register_exception shim, exception translator, tuple builds…

20be65a

…, type_object, capsule.data, len, more conversions

nanobind: more str/bytes/Identifier conversions, tuple iteration, typ…

f09ca7e

…e_object

nanobind: capsule.data, py::args binding fixes (Project/FunctionExpre…

78b1cc0

…ssion), dict/list builds, bytes; numpy buffer-pointer caching (perf)

nanobind: .none() on return_type bound-type arg

965e81a

nanobind: fix null py::str()/py::int_() default-construction in excep…

f80cb9e

…tions (crash on import)

nanobind: fix accessor->wrapper reinterpret crashes (Series/Index/lis…

c9f99f2

…t type-punning) in dataframe/scan/bind/map/udf

nanobind: fix FrameLocalsProxy (PEP 667) replacement-scan bad_cast; r…

262c70a

…ebind __exit__ via lambda

nanobind: __exit__ pointer-self + .none() args

1b115b6

nanobind: enum-instance acceptance in STRING_INT caster; .none() on j…

0a23723

…oin other_rel

nanobind: TransformPyConfigDict str-ify values; filesystem timestamp …

9286f8c

…float cast

nanobind: DuckDBPyType::TryConvert helper to restore implicit type co…

fe4fb74

…nversion (shared_ptr caster strips convert)

nanobind: UDF signature mappingproxy->dict; TryConvert for UDF parame…

82c6ebf

…ter types

nanobind: numpy __version__ string->tuple conversion in UDF path

06bb706

nanobind: custom shared_ptr<DuckDBPyType> caster (keep convert flag f…

eb9f87a

…or implicit conversions); guard numpy ctypes eager-compute

nanobind: simplify DuckDBPyType from_cpp (no type_hook)

d35c15c

nanobind: UDF kind via enum .name; TryConvert clears PyErr

a65adc0

nanobind: fix py::str(accessor) reinterpret bug (wrap in py::object s…

5cb74bd

…o PyObject_Str runs) across numpy/pandas/udf/replacement paths

nanobind: .none() on ConstantExpression value (no-default py::object …

1475be6

…accepting None)

fix cmakelists

c72040e

Fix smart pointer issues

976dc5b

fix pandas

0f57ddf

long tail fixes

18692be

evertlammerts added 11 commits June 29, 2026 14:40

remove Py 3.10 support

fc677e2

fix for msvc

208082a

fix deployment target for python 3.11

c551c77

fix None on expressions

6b87a2e

weakrefs work again

f436f65

fix asan issues

cd43e38

tuple field assignment wrapper

6324191

reorg of files and PyUtil extraction

1231037

rename

5c67d68

bulk cleanup

8ae5c46

fix format

2983c92

evertlammerts force-pushed the prototype/nanobind-cutover branch from b9929c1 to 2983c92 Compare June 30, 2026 18:36

evertlammerts added 4 commits July 1, 2026 07:21

fix regressions and get on par with main

3c4528f

pre-commit fixes

38eaa4c

trim

40c1b32

evertlammerts marked this pull request as draft July 1, 2026 08:57

evertlammerts marked this pull request as ready for review July 1, 2026 08:57

evertlammerts added 5 commits July 1, 2026 11:07

Merge main (duckdb#519); superseded by the nanobind re-implementation

3cc4286

ruff fixes

501a3fc

improve allocation for numpy

a875747

review fixes

75a6489

bump submodule and fix drift

91bdd3a

evertlammerts merged commit d7e138f into duckdb:main Jul 1, 2026
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Move to nanobind#522

Move to nanobind#522
evertlammerts merged 50 commits into
duckdb:mainfrom
evertlammerts:prototype/nanobind-cutover

evertlammerts commented Jun 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

evertlammerts commented Jun 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant