test: Add TPC-DS test results#77
Merged
Merged
Conversation
viirya
approved these changes
Feb 21, 2024
Member
Author
|
Will need to add a separate Github job to check the TPC-DS query results. Will do it later. |
Member
Author
|
Thanks, merged |
schenksj
added a commit
to schenksj/datafusion-comet
that referenced
this pull request
May 13, 2026
Closes the streaming + MERGE-with-DV gap (apache#77). Previously the pre-materialised FileIndex code path declined Comet whenever any AddFile carried a DeletionVectorDescriptor, forcing fallback to Spark+Delta. Now we materialise the DV on the driver via Delta's HadoopFileSystemDVStore (reflection, no compile-time dep) and feed the resulting row-index list through the proto's existing deleted_row_indexes field; the native planner already wraps DV'd file groups in DeltaDvFilterExec. ExtractedAddFile gains a dvDescriptor: AnyRef field; the convert path materialises indexes for any AddFile that carries one, falling back if reflection or the DV read fails (silently dropping a DV would be a correctness violation). Verified against DeletionVectorsSuite (293/293 passed) and the MergeIntoDVsSuite metrics tests (DV write + subsequent DV-aware read both go through native). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
schenksj
added a commit
to schenksj/datafusion-comet
that referenced
this pull request
Jun 9, 2026
…che#77) Core's native/core/.../delta_scan.rs held ~250 lines of Delta-specific scan planning (kernel schema selection, KernelScanFile mapping, storage-config/S3-bucket resolution, final_output_indices reorder, the kernel_read gate). Move all of it into comet_contrib_delta::planner::plan_delta_scan, so core stays free of Delta planning logic (cleaner for upstreaming apache#4366 -- reviewers see core untouched by Delta). Core's delta_scan.rs is now a thin shim: it computes the requested + partition Arrow schemas (core owns the proto->arrow `to_arrow_datatype` converter, used across the planner, and the contrib crate can't depend on core -- that would cycle), calls the contrib planner, and wraps the returned ExecutionPlan in a SparkPlan. No behaviour change. Verified: full contrib package 152/0 under Spark 4.1.1. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
schenksj
added a commit
to schenksj/datafusion-comet
that referenced
this pull request
Jun 9, 2026
…pache#85) The design docs had drifted from the kernel-read refactor (apache#76/apache#77/apache#80/apache#81/apache#82/ apache#84/#2/apache#78/apache#86). Audited all 13 docs against current code and corrected: - Removed the deleted ParquetSource + DV-sweep + DeltaSyntheticColumnsExec read stack as the "current" path everywhere; it is now kernel-read only (apache#50/apache#82), with DeltaKernelScanExec doing in-worker synthesis. The old stack is kept only as clearly-labeled history / rejected alternatives. - delta_scan.rs is a ~72-line shim delegating to comet_contrib_delta::planner (apache#77); column-mapping physicalisation dropped, kernel ships the schemas (apache#76). - CDF (readChangeFeed) is kernel-native via TableChanges -> CometDeltaCdfScanExec, split multi-partition (apache#84/#2) -- corrected docs that called it unsupported, declined, or a synthetic-columns fallback. - 08-known-limitations.md: removed all of Part B (B1-B9 were development-time regressions, all now fixed + guarded) and A3 (path-based CDF now engages native, apache#84); kept only genuine current limitations (A1 DPP residual, A2e credential residual, A4 VARIANT, A5 decline gates, A6 INT96 kernel gap, A7 CM-id repoint). 466 -> 230 lines. - Fixed config keys, build/module layout, JNI symbols, file paths, CI workflow references, and supported-feature lists (added CDF, _metadata, INT96) across the build / README / user-guide docs. Every claim verified against code; markdown passes prettier. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #.
Rationale for this change
This adds back TPC-DS query results for
CometTPCDSQuerySuite, which serves as a extra test suite to ensure Comet produces same results as Spark does.What changes are included in this PR?
Adds back TPC-DS query results golden files to the repo. These files were generated using
CometTPCDSQuerySuitewith Comet turned off, to make sure it is the same as Spark's results.How are these changes tested?