Skip to content

fix(workflow-operator): emit only filled reservoir slots in onFinish#5411

Closed
Ma77Ball wants to merge 5 commits into
apache:mainfrom
Ma77Ball:fix/5409-reservoir-null-padding
Closed

fix(workflow-operator): emit only filled reservoir slots in onFinish#5411
Ma77Ball wants to merge 5 commits into
apache:mainfrom
Ma77Ball:fix/5409-reservoir-null-padding

Conversation

@Ma77Ball

@Ma77Ball Ma77Ball commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

What

ReservoirSamplingOpExec.onFinish emitted null tuples when the input size was less than k. Fixed by emitting only the n populated reservoir slots via reservoir.iterator.take(n).

Why

The reservoir is a fixed-size Array[Tuple](k). With fewer than k inputs the trailing slots stay null and were returned by onFinish. Today they are silently dropped by the null-check in DataProcessor.outputOneTuple (DataProcessor.scala:157), so there is no crash, but it violates the operator contract and any other consumer of the output would observe the nulls.

Test

Regression test asserts no null padding when input size is less than k. Verified it fails on the old reservoir.iterator and passes with take(n).

Notes

Stacked on #5384 (the ReservoirSamplingOpExec spec) until that merges, so the diff temporarily shows that PR's spec additions. After #5384 merges to main this rebases cleanly.

Closes #5409

@codecov-commenter

codecov-commenter commented Jun 6, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 51.67%. Comparing base (8a4cb29) to head (31d726d).
⚠️ Report is 8 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #5411      +/-   ##
============================================
+ Coverage     51.62%   51.67%   +0.04%     
- Complexity     2420     2430      +10     
============================================
  Files          1056     1056              
  Lines         41046    41047       +1     
  Branches       4407     4407              
============================================
+ Hits          21190    21211      +21     
+ Misses        18617    18594      -23     
- Partials       1239     1242       +3     
Flag Coverage Δ *Carryforward flag
access-control-service 41.89% <ø> (ø)
agent-service 33.76% <ø> (ø) Carriedforward from cf6beaa
amber 52.36% <100.00%> (+0.12%) ⬆️
computing-unit-managing-service 1.38% <ø> (ø)
config-service 55.38% <ø> (+0.69%) ⬆️
file-service 38.42% <ø> (ø)
frontend 46.32% <ø> (ø) Carriedforward from cf6beaa
pyamber 90.69% <ø> (ø) Carriedforward from cf6beaa
python 90.84% <ø> (ø) Carriedforward from cf6beaa
workflow-compiling-service 58.39% <ø> (ø)

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Yicong-Huang

Copy link
Copy Markdown
Contributor

@Ma77Ball please make sure to follow our PR template. closed for now. please feel free to reopen it once you update the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ReservoirSamplingOpExec emits null padding tuples

3 participants