Skip to content

fix: skip only null partition value rows in ParseDataFile#756

Merged
wgtmac merged 2 commits into
apache:mainfrom
WZhuo:fix_manifest_reader
Jun 18, 2026
Merged

fix: skip only null partition value rows in ParseDataFile#756
wgtmac merged 2 commits into
apache:mainfrom
WZhuo:fix_manifest_reader

Conversation

@WZhuo

@WZhuo WZhuo commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Summary

ParseDataFile() in manifest_reader.cc iterated over rows for each partition child field and used break to skip null partition values. break exits the entire inner row loop, so once a null value was encountered, all subsequent rows for that partition column were skipped and never had their partition values parsed.

The fix changes break to continue, so only the current null row is skipped while the remaining rows are still processed.

Changes

  • src/iceberg/manifest/manifest_reader.cc: breakcontinue in the partition-value row loop.
  • src/iceberg/test/manifest_reader_test.cc: add regression test NullPartitionValueDoesNotSkipSubsequentRows that writes a manifest with a null partition value before non-null ones and verifies the later rows are parsed correctly.

Testing

  • New test passes for manifest versions V1/V2/V3.
  • Confirmed the test fails with the original break (partition.num_fields() is 0 instead of 1 for rows after the null one).
  • Full manifest_test suite passes (171 tests).

The inner row loop in ParseDataFile() used break to skip null partition
values, which aborted processing of all subsequent rows for that
partition column. Use continue so only the current null row is skipped.

Add a regression test that places a null partition value before non-null
ones and verifies the later rows are still parsed.
@wgtmac wgtmac requested a review from Copilot June 18, 2026 16:00

@zhjwpku zhjwpku left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a parsing control-flow bug in ParseDataFile() (manifest reading) where encountering a null partition value could incorrectly stop parsing subsequent rows for that partition column, and adds a regression test to ensure later rows still get parsed.

Changes:

  • Update manifest partition parsing to avoid prematurely terminating the per-row loop when a null partition value is encountered.
  • Add a regression test covering a null partition value followed by non-null values.
  • Ensure behavior is validated across manifest format versions via the parameterized test suite.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/iceberg/manifest/manifest_reader.cc Adjusts partition-value row-loop behavior when encountering nulls during manifest entry parsing.
src/iceberg/test/manifest_reader_test.cc Adds a regression test ensuring null partition values do not prevent parsing later rows.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/iceberg/manifest/manifest_reader.cc Outdated
Comment thread src/iceberg/test/manifest_reader_test.cc Outdated
@WZhuo WZhuo requested a review from wgtmac June 18, 2026 17:06
@wgtmac wgtmac merged commit 9a2e21d into apache:main Jun 18, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants