feat: add Iceberg v3 type definitions#752
Conversation
1c1f95c to
00c7860
Compare
Introduce the Iceberg v3 types (variant, geometry, geography), including their schema/JSON serialization and type-system integration (visitors, schema projection, etc.). Reading and writing data of these types is not implemented yet: conversion to/from Arrow, Avro, and Parquet returns an error, as do identity transform binding and scalar validation for them.
00c7860 to
2b09f55
Compare
| explicit GeometryType(std::string crs); | ||
| ~GeometryType() override = default; | ||
|
|
||
| [[nodiscard]] std::string_view crs() const; |
There was a problem hiding this comment.
| [[nodiscard]] std::string_view crs() const; | |
| std::string_view crs() const; |
There was a problem hiding this comment.
Done. I removed all remaining [[nodiscard]] annotations in type.h
| bool Equals(const Type& other) const override; | ||
|
|
||
| private: | ||
| std::optional<std::string> crs_; |
There was a problem hiding this comment.
Empty thing is enough to represent a missing crs.
There was a problem hiding this comment.
Agreed, changed to string.
There was a problem hiding this comment.
Pull request overview
This PR introduces Iceberg v3 type-system support by adding the new types variant, geometry, and geography (plus EdgeAlgorithm for geography), and wiring them through the existing visitor/type utilities, schema/JSON parsing & serialization, and compatibility checks. Data read/write support for these types is explicitly not implemented yet (Arrow/Avro/Parquet conversions and identity transform binding return errors).
Changes:
- Add v3
TypeIds (kVariant,kGeometry,kGeography) and correspondingTypeimplementations (including CRS/edge algorithm handling and stringification). - Integrate v3 types into visitors, schema projection/utilities, transforms, and format-version gating; return
NotSupportedfor unsupported IO/conversions. - Extend and adjust unit tests to cover v3 type parsing/printing and “unsupported” behavior in conversions/transforms.
Reviewed changes
Copilot reviewed 28 out of 28 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| src/iceberg/util/visitor_generate.h | Extends generated visitor action lists and adds explicit dispatch for variant in “primitive default” switch. |
| src/iceberg/util/visit_type.h | Updates categorical visitor docs to include a fifth category for variant. |
| src/iceberg/util/type_util.h | Adds VariantType overloads to schema/type utility visitors. |
| src/iceberg/util/type_util.cc | Implements VariantType visitor handling and adjusts projection logic to treat non-nested leaf types consistently. |
| src/iceberg/util/struct_like_set.cc | Returns NotSupported for scalar validation of v3 types. |
| src/iceberg/update/update_schema.cc | Adds VisitVariant handling in schema-update visitor. |
| src/iceberg/type.h | Adds VariantType, GeometryType, GeographyType, factories, and edge-algorithm APIs; updates type factory group docs. |
| src/iceberg/type.cc | Implements v3 type behavior, factories, TypeId/EdgeAlgorithm string conversions and parsing. |
| src/iceberg/type_fwd.h | Adds new TypeIds, EdgeAlgorithm, and forward declarations for new types. |
| src/iceberg/transform.cc | Disables identity transform for geometry/geography. |
| src/iceberg/transform_function.cc | Enforces identity-transform input-type restrictions for geometry/geography. |
| src/iceberg/test/visit_type_test.cc | Extends type test cases to include v3 types and updates nested-vs-non-nested expectations. |
| src/iceberg/test/type_test.cc | Extends type test cases, adjusts nested checks, and adds geography default/algorithm equality tests. |
| src/iceberg/test/transform_test.cc | Adds coverage ensuring identity transform rejects v3 types. |
| src/iceberg/test/schema_test.cc | Adds schema projection test coverage for variant fields. |
| src/iceberg/test/schema_json_test.cc | Adds JSON round-trip and invalid-input tests for v3 type strings (case/spacing/algorithms). |
| src/iceberg/test/rest_json_serde_test.cc | Updates expected error message to match new “Cannot parse type string” behavior. |
| src/iceberg/test/arrow_test.cc | Adds test asserting Arrow conversion rejects v3 types. |
| src/iceberg/table_metadata.h | Gates v3 types behind Iceberg format version >= 3. |
| src/iceberg/schema_internal.cc | Refactors Arrow schema conversion to return Status, improves error reporting with type paths, and rejects v3 types explicitly. |
| src/iceberg/parquet/parquet_writer.cc | Adds VisitVariant to metrics collector visitor. |
| src/iceberg/parquet/parquet_schema_util.cc | Rejects reading v3 types from Parquet schema evolution validation. |
| src/iceberg/parquet/parquet_metrics.cc | Adds VisitVariant to metrics visitor. |
| src/iceberg/metrics_config.cc | Treats variant as a non-nested leaf for metrics field-id limiting. |
| src/iceberg/json_serde.cc | Adds JSON serialization and parsing for v3 types (including CRS and edge algorithm); normalizes primitive parsing to be case-insensitive. |
| src/iceberg/delete_file_index.cc | Adjusts equality-delete bound conversion to skip any non-primitive types (avoids mis-casting variant). |
| src/iceberg/avro/avro_schema_util.cc | Rejects writing/reading v3 types to/from Avro with NotSupported. |
| src/iceberg/avro/avro_schema_util_internal.h | Declares Avro visitor overloads for v3 types. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /// \brief Get the type ID. | ||
| [[nodiscard]] virtual TypeId type_id() const = 0; | ||
| virtual TypeId type_id() const = 0; | ||
|
|
||
| /// \brief Is this a primitive type (may not have child fields)? | ||
| [[nodiscard]] virtual bool is_primitive() const = 0; | ||
| virtual bool is_primitive() const = 0; |
| /// \brief Is this a nested type (may have child fields)? | ||
| [[nodiscard]] virtual bool is_nested() const = 0; | ||
| virtual bool is_nested() const = 0; |
| protected: | ||
| /// \brief Compare two types for equality. | ||
| [[nodiscard]] virtual bool Equals(const Type& other) const = 0; | ||
| virtual bool Equals(const Type& other) const = 0; |
| /// \brief Get a view of the child fields. | ||
| [[nodiscard]] virtual std::span<const SchemaField> fields() const = 0; | ||
| virtual std::span<const SchemaField> fields() const = 0; |
| /// \brief Get a field by name (case-sensitive). | ||
| [[nodiscard]] Result<std::optional<SchemaFieldConstRef>> GetFieldByName( | ||
| std::string_view name) const; | ||
| Result<std::optional<SchemaFieldConstRef>> GetFieldByName(std::string_view name) const; |
| /// \brief Get the precision (the number of decimal digits). | ||
| [[nodiscard]] int32_t precision() const; | ||
| int32_t precision() const; | ||
| /// \brief Get the scale (essentially, the number of decimal digits after | ||
| /// the decimal point; precisely, the value is scaled by $$10^{-s}$$.). | ||
| [[nodiscard]] int32_t scale() const; | ||
| int32_t scale() const; |
| /// \brief Is this type zoned or naive? | ||
| [[nodiscard]] virtual bool is_zoned() const = 0; | ||
| virtual bool is_zoned() const = 0; | ||
| /// \brief The time resolution. | ||
| [[nodiscard]] virtual TimeUnit time_unit() const = 0; | ||
| virtual TimeUnit time_unit() const = 0; |
| /// \brief The length (the number of bytes to store). | ||
| [[nodiscard]] int32_t length() const; | ||
| int32_t length() const; |
Introduce the Iceberg v3 types (variant, geometry, geography), including their schema/JSON serialization and type-system integration (visitors, schema projection, etc.).
Reading and writing data of these types is not implemented yet: conversion to/from Arrow, Avro, and Parquet returns an error, as do identity transform binding and scalar validation for them.