fix: skip only null partition value rows in ParseDataFile#756
Merged
Conversation
The inner row loop in ParseDataFile() used break to skip null partition values, which aborted processing of all subsequent rows for that partition column. Use continue so only the current null row is skipped. Add a regression test that places a null partition value before non-null ones and verifies the later rows are still parsed.
wgtmac
approved these changes
Jun 18, 2026
There was a problem hiding this comment.
Pull request overview
This PR addresses a parsing control-flow bug in ParseDataFile() (manifest reading) where encountering a null partition value could incorrectly stop parsing subsequent rows for that partition column, and adds a regression test to ensure later rows still get parsed.
Changes:
- Update manifest partition parsing to avoid prematurely terminating the per-row loop when a null partition value is encountered.
- Add a regression test covering a null partition value followed by non-null values.
- Ensure behavior is validated across manifest format versions via the parameterized test suite.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/iceberg/manifest/manifest_reader.cc |
Adjusts partition-value row-loop behavior when encountering nulls during manifest entry parsing. |
src/iceberg/test/manifest_reader_test.cc |
Adds a regression test ensuring null partition values do not prevent parsing later rows. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ParseDataFile()inmanifest_reader.cciterated over rows for each partition child field and usedbreakto skip null partition values.breakexits the entire inner row loop, so once a null value was encountered, all subsequent rows for that partition column were skipped and never had their partition values parsed.The fix changes
breaktocontinue, so only the current null row is skipped while the remaining rows are still processed.Changes
src/iceberg/manifest/manifest_reader.cc:break→continuein the partition-value row loop.src/iceberg/test/manifest_reader_test.cc: add regression testNullPartitionValueDoesNotSkipSubsequentRowsthat writes a manifest with a null partition value before non-null ones and verifies the later rows are parsed correctly.Testing
break(partition.num_fields()is 0 instead of 1 for rows after the null one).manifest_testsuite passes (171 tests).