Skip to content

test(workflow-operator): add ReservoirSamplingOpExec spec [release/v1.2 backport]#5642

Merged
Yicong-Huang merged 1 commit into
apache:release/v1.2from
xuang7:backport/5384-reservoir-sampling-spec-v1.2
Jun 12, 2026
Merged

test(workflow-operator): add ReservoirSamplingOpExec spec [release/v1.2 backport]#5642
Yicong-Huang merged 1 commit into
apache:release/v1.2from
xuang7:backport/5384-reservoir-sampling-spec-v1.2

Conversation

@xuang7

@xuang7 xuang7 commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this PR?

Backport of #5384 to release/v1.2: adds ReservoirSamplingOpExecSpec.scala,
the unit test spec for the reservoir sampling operator. Clean
git cherry-pick -x of the original squash commit (38f5ac5fb), purely
additive, byte-identical to main.

#5606 modifies this spec file, so it must exist on release/v1.2 for the
#5606 auto-backport to apply.

Any related issues, documentation, discussions?

How was this PR tested?

Covered by CI; the spec already passed on main.

Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Fable 5)

### What changes were proposed in this PR?
- Add `ReservoirSamplingOpExecSpec`, the first dedicated spec for the
streaming reservoir-sampling executor.
- Cover `processTuple` buffering (emits nothing per tuple), and
`onFinish` returning all input tuples in order when input size equals k.
- Cover keeping exactly k unique, input-drawn samples when input exceeds
k, determinism of the seeded RNG (with a check that replacement actually
happens), multi-worker partitioning of k via `equallyPartitionGoal`
(k=10 across 3 workers yields 4,3,3), and `open` resetting state for
executor reuse.
- Characterize the input-size-below-k edge case, where the unfilled
fixed-size reservoir emits null padding on finish (flagged in the spec
as a likely bug for a follow-up fix).
### Any related issues, documentation, or discussions?
Closes: apache#5383
### How was this PR tested?
- Run `sbt "WorkflowOperator/testOnly *ReservoirSamplingOpExecSpec"` and
expect all 7 examples to pass.
- This is a test-only PR (no production code changed), so the spec
itself is the verification; the input-below-k example intentionally
asserts the current null-padding behavior, so a green run confirms that
characterization rather than a fix.
### Was this PR authored or co-authored using generative AI tooling?
Co-authored with Claude Opus 4.8 in compliance with ASF

(cherry picked from commit 38f5ac5)
@codecov-commenter

codecov-commenter commented Jun 12, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 51.92%. Comparing base (7780a56) to head (d804975).
⚠️ Report is 1 commits behind head on release/v1.2.

Additional details and impacted files
@@                Coverage Diff                 @@
##             release/v1.2    #5642      +/-   ##
==================================================
+ Coverage           51.86%   51.92%   +0.05%     
- Complexity           2458     2470      +12     
==================================================
  Files                1065     1065              
  Lines               41238    41238              
  Branches             4417     4417              
==================================================
+ Hits                21389    21413      +24     
+ Misses              18589    18566      -23     
+ Partials             1260     1259       -1     
Flag Coverage Δ *Carryforward flag
access-control-service 64.61% <ø> (ø)
agent-service 34.36% <ø> (ø) Carriedforward from 7780a56
amber 52.17% <ø> (+0.15%) ⬆️
computing-unit-managing-service 1.65% <ø> (ø)
config-service 56.06% <ø> (ø)
file-service 57.06% <ø> (ø)
frontend 46.31% <ø> (ø) Carriedforward from 7780a56
pyamber 90.67% <ø> (ø) Carriedforward from 7780a56
python 90.75% <ø> (ø) Carriedforward from 7780a56
workflow-compiling-service 58.69% <ø> (ø)

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@xuang7

xuang7 commented Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

Requested review from the previous reviewer. Thanks! @aglinxinyuan @Yicong-Huang

@Yicong-Huang Yicong-Huang changed the title test(workflow-operator): add ReservoirSamplingOpExec spec (#5384) [release/v1.2 backport] test(workflow-operator): add ReservoirSamplingOpExec spec [release/v1.2 backport] Jun 12, 2026
@Yicong-Huang Yicong-Huang merged commit 045328e into apache:release/v1.2 Jun 12, 2026
23 checks passed
@xuang7 xuang7 deleted the backport/5384-reservoir-sampling-spec-v1.2 branch July 1, 2026 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants