fix: Appending null values to element array builders of StructBuilder for null row in a StructArray#78
Merged
Conversation
… for null row in a StructArray
sunchao
approved these changes
Feb 21, 2024
sunchao
left a comment
Member
There was a problem hiding this comment.
LGTM. Already reviewed internally.
Member
Author
|
Merged. Thanks. |
schenksj
added a commit
to schenksj/datafusion-comet
that referenced
this pull request
Jun 7, 2026
…che#78 schema-change-since-analysis) Regression from the kernel-schema-shipping work: DeltaColumnMappingSuite "column mapping batch scan should detect physical name changes" read the current data instead of null-filling, because the driver fed `ScanBuilder::with_schema` the LIVE snapshot schema. Kernel resolves physical names from the schema passed to `with_schema` (StateInfo::try_new -> StructField::make_physical), so it must be fed the schema the query was PLANNED with -- then kernel's field-id matching null-fills any column whose id changed since analysis (Delta's schema-on-read escape hatch), which a pure-kernel engine handles itself with no fallback. - The JVM ships the analysis-time read schema as Delta schema JSON (`StructType.json` from DeltaScanRule's stashed reference schema, carrying `delta.columnMapping.physicalName` + id at every level). The driver parses it (`serde_json` -> kernel `StructType`, the same format kernel reads from the log) and feeds it straight to `with_schema`; it falls back to projecting the live snapshot by column name only when no analysis-time schema is available. - The analysis-time JSON and the Arrow-IPC names are mutually exclusive on the wire (ship the JSON when present, else the names) -- no redundant double-ship. - Fix `planDeltaReadSchemas` to build kernel schemas when EITHER carrier is present (it previously gated on the IPC only, so a JSON-only payload silently produced no schemas). Red->green guard: CometDeltaSchemaChangeReproSuite (Comet returned data, Spark null-filled; now both null-fill). Own-suite "physical name changes" goes green; all contrib Delta suites stay green. (DeltaColumnMappingSuite "explicit id matching" -- a contrived manual field-id repoint -- is a separate kernel id-vs-name matching nuance, tracked in apache#79.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
schenksj
added a commit
to schenksj/datafusion-comet
that referenced
this pull request
Jun 7, 2026
…op the Arrow-IPC path) The kernel-read path previously shipped the data-read schema two ways -- Arrow-IPC column names (projected against the snapshot driver-side) and, for apache#78, the analysis-time schema as Delta JSON. The JSON carrier subsumes both: it's the query's data columns drawn from the analysis-time schema (falling back to the snapshot schema), carrying delta.columnMapping.physicalName + id at every nesting level -- the same Delta-JSON format kernel deserializes for the log schema. So drop the IPC carrier entirely: - `dataReadSchemaJson` now sources annotations from analyzedSchema.orElse(snapshot schema), using each required data column's annotated field (or the required field as-is for non-column-mapping tables). Removed `dataReadSchemaIpc`. - Removed the `projectedSchemaIpc` arg from `planDeltaScan` / `planDeltaReadSchemas` (JNI + Native.scala) and the `projected_columns` / `build_read_schema` snapshot- projection fallback from scan.rs; the driver parses one JSON via `read_schema_from_json`. One wire carrier, no Arrow-IPC marshalling, no driver-side snapshot re-projection. All contrib Delta suites green (incl. the apache#78 schema-change repro). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
schenksj
added a commit
to schenksj/datafusion-comet
that referenced
this pull request
Jun 9, 2026
…pache#85) The design docs had drifted from the kernel-read refactor (apache#76/apache#77/apache#80/apache#81/apache#82/ apache#84/#2/apache#78/apache#86). Audited all 13 docs against current code and corrected: - Removed the deleted ParquetSource + DV-sweep + DeltaSyntheticColumnsExec read stack as the "current" path everywhere; it is now kernel-read only (apache#50/apache#82), with DeltaKernelScanExec doing in-worker synthesis. The old stack is kept only as clearly-labeled history / rejected alternatives. - delta_scan.rs is a ~72-line shim delegating to comet_contrib_delta::planner (apache#77); column-mapping physicalisation dropped, kernel ships the schemas (apache#76). - CDF (readChangeFeed) is kernel-native via TableChanges -> CometDeltaCdfScanExec, split multi-partition (apache#84/#2) -- corrected docs that called it unsupported, declined, or a synthetic-columns fallback. - 08-known-limitations.md: removed all of Part B (B1-B9 were development-time regressions, all now fixed + guarded) and A3 (path-based CDF now engages native, apache#84); kept only genuine current limitations (A1 DPP residual, A2e credential residual, A4 VARIANT, A5 decline gates, A6 INT96 kernel gap, A7 CM-id repoint). 466 -> 230 lines. - Fixed config keys, build/module layout, JNI symbols, file paths, CI workflow references, and supported-feature lists (added CDF, _metadata, INT96) across the build / README / user-guide docs. Every claim verified against code; markdown passes prettier. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #79.
Rationale for this change
When encountering a null row, besides appending a null value to
StructBuilderby calling itsappend_null, we also need to append null values to all its element array builders, so their lengths are kept the same.What changes are included in this PR?
How are these changes tested?