Skip to content

test(workflow-operator): add unit test coverage for operator support utilities#6077

Merged
aglinxinyuan merged 2 commits into
apache:mainfrom
aglinxinyuan:test-operator-factory-helpers
Jul 3, 2026
Merged

test(workflow-operator): add unit test coverage for operator support utilities#6077
aglinxinyuan merged 2 commits into
apache:mainfrom
aglinxinyuan:test-operator-factory-helpers

Conversation

@aglinxinyuan

Copy link
Copy Markdown
Contributor

What changes were proposed in this PR?

Add unit test coverage for three workflow-operator support utilities, selected from the Codecov report. No production-code changes.

File Codecov before Missed lines targeted
TestOperators.scala 26.3% 56 — every desc-factory helper (CSV/JSONL scans, hash-join, keyword-search, aggregate + group-by, AsterixDB source, python & python-source UDFs); asserts the returned descriptors' fields
SpecialPhysicalOpFactory.scala 0% 17 — newSourcePhysicalOp: decoded-port identity derivation, layer-name interpolation (incl. dash→underscore), empty input ports, output port + schema propagation, and the no-port-id failure
FileScanUtils.scala 62.9% 11 — the extract == true archive branch of createTuplesFromFile: zip-entry iteration, __MACOSX filtering, single-string vs per-line attribute paths

TestOperators is itself a shared test-helper factory (in src/main) used by many specs, but nothing asserted on its output, so its bodies read as uncovered. All tests run in-memory or against local temp files / committed test resources.

Any related issues, documentation, discussions?

Follow-up to the review feedback on #6043: prioritize tests that fill uncovered code paths.

How was this PR tested?

  • sbt "WorkflowOperator/testOnly *TestOperatorsSpec *SpecialPhysicalOpFactorySpec *FileScanUtilsSpec" — 16 tests, all green
  • sbt "WorkflowOperator/Test/scalafmtCheck" and sbt "WorkflowOperator/scalafixAll --check" — clean

Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.8 [1M context])

…utilities

Targets Codecov-missed lines: TestOperators (56, the shared desc-factory helper),
SpecialPhysicalOpFactory (17, source physical-op wiring), and the extract branch
of FileScanUtils.createTuplesFromFile (11). All in-memory or local temp files.
Copilot AI review requested due to automatic review settings July 3, 2026 07:03
@github-actions github-actions Bot added the common label Jul 3, 2026
@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Automated Reviewer Suggestions

Based on the git blame history of the changed files, we recommend the following reviewers:

  • No candidates found from git blame history.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds new ScalaTest unit specs in common/workflow-operator to increase coverage of previously-uncovered support utilities (no production-code changes).

Changes:

  • Add descriptor-construction assertions for TestOperators helpers (CSV/JSONL scans, join, keyword search, aggregate/group-by, AsterixDB, Python UDFs).
  • Add coverage for SpecialPhysicalOpFactory.newSourcePhysicalOp, including layer-name derivation and missing-port-id failure behavior.
  • Add coverage for FileScanUtils.createTuplesFromFile’s extract = true zip-archive branch.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
common/workflow-operator/src/test/scala/org/apache/texera/amber/operator/TestOperatorsSpec.scala New unit spec asserting key fields on operator descriptors produced by TestOperators helpers.
common/workflow-operator/src/test/scala/org/apache/texera/amber/operator/SpecialPhysicalOpFactorySpec.scala New unit spec validating newSourcePhysicalOp identity/port/schema wiring and error behavior.
common/workflow-operator/src/test/scala/org/apache/texera/amber/operator/source/scan/file/FileScanUtilsSpec.scala New unit spec exercising zip extraction paths for single-string and per-line tuple creation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov-commenter

codecov-commenter commented Jul 3, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 57.39%. Comparing base (a53d95a) to head (21a2ba9).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #6077      +/-   ##
============================================
+ Coverage     57.21%   57.39%   +0.18%     
+ Complexity     3103     3100       -3     
============================================
  Files          1130     1130              
  Lines         43825    43825              
  Branches       4747     4747              
============================================
+ Hits          25075    25154      +79     
+ Misses        17313    17232      -81     
- Partials       1437     1439       +2     
Flag Coverage Δ *Carryforward flag
access-control-service 70.00% <ø> (ø)
agent-service 44.59% <ø> (ø) Carriedforward from fa09f73
amber 59.95% <ø> (+0.47%) ⬆️
computing-unit-managing-service 0.00% <ø> (ø) Carriedforward from fa09f73
config-service 52.30% <ø> (ø)
file-service 62.81% <ø> (ø)
frontend 50.12% <ø> (ø) Carriedforward from fa09f73
notebook-migration-service 78.57% <ø> (ø)
pyamber 91.15% <ø> (ø) Carriedforward from fa09f73
python 90.69% <ø> (ø) Carriedforward from fa09f73
workflow-compiling-service 55.14% <ø> (ø)

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

⚠️ Benchmark changes need a look

🟢 2 better · 🔴 5 worse · ⚪ 8 noise (<±5%) · 0 without baseline

Compared against main a53d95a benchmarked on this same runner, so the delta is largely free of cross-runner hardware noise. The "7d avg" column still reflects the gh-pages dashboard. Treat <±5% as noise unless repeated.

Dashboard · Run

config throughput MB/s latency max Δ latest / 7d
🔴 bs=10 sw=10 sl=64 360 0.22 26,582/38,577/38,577 us 🔴 +11.3% / 🔴 +152.4%
🟢 bs=100 sw=10 sl=64 813 0.496 123,011/137,755/137,755 us 🟢 -9.0% / 🔴 +27.1%
bs=1000 sw=10 sl=64 913 0.557 1,099,530/1,157,884/1,157,884 us ⚪ within ±5% / 🔴 +11.7%
Baseline details

Latest main a53d95a from same runner

config metric PR latest main 7d avg Δ latest Δ 7d
bs=10 sw=10 sl=64 throughput 360 tuples/sec 398 tuples/sec 770.95 tuples/sec -9.5% -53.3%
bs=10 sw=10 sl=64 MB/s 0.22 MB/s 0.243 MB/s 0.471 MB/s -9.5% -53.2%
bs=10 sw=10 sl=64 p50 26,582 us 24,637 us 12,775 us +7.9% +108.1%
bs=10 sw=10 sl=64 p95 38,577 us 34,659 us 15,286 us +11.3% +152.4%
bs=10 sw=10 sl=64 p99 38,577 us 34,659 us 18,795 us +11.3% +105.3%
bs=100 sw=10 sl=64 throughput 813 tuples/sec 806 tuples/sec 976.93 tuples/sec +0.9% -16.8%
bs=100 sw=10 sl=64 MB/s 0.496 MB/s 0.492 MB/s 0.596 MB/s +0.8% -16.8%
bs=100 sw=10 sl=64 p50 123,011 us 121,438 us 102,557 us +1.3% +19.9%
bs=100 sw=10 sl=64 p95 137,755 us 151,388 us 108,383 us -9.0% +27.1%
bs=100 sw=10 sl=64 p99 137,755 us 151,388 us 115,249 us -9.0% +19.5%
bs=1000 sw=10 sl=64 throughput 913 tuples/sec 928 tuples/sec 1,009 tuples/sec -1.6% -9.5%
bs=1000 sw=10 sl=64 MB/s 0.557 MB/s 0.566 MB/s 0.616 MB/s -1.6% -9.6%
bs=1000 sw=10 sl=64 p50 1,099,530 us 1,080,227 us 997,695 us +1.8% +10.2%
bs=1000 sw=10 sl=64 p95 1,157,884 us 1,103,798 us 1,036,731 us +4.9% +11.7%
bs=1000 sw=10 sl=64 p99 1,157,884 us 1,103,798 us 1,069,334 us +4.9% +8.3%
Raw CSV
config_idx,batch_size,schema_width,string_len,num_batches,total_ms,total_tuples,total_bytes,tuples_per_sec,mb_per_sec,lat_p50_us,lat_p95_us,lat_p99_us
0,10,10,64,20,555.15,200,128000,360,0.220,26581.96,38577.39,38577.39
1,100,10,64,20,2460.03,2000,1280000,813,0.496,123010.99,137754.73,137754.73
2,1000,10,64,20,21913.51,20000,12800000,913,0.557,1099530.11,1157884.33,1157884.33

- assert the actual decoded entry contents (not just tuple counts) for the
  extract, __MACOSX-filter, and per-line flat-map cases
- fix a stray apostrophe in the SpecialPhysicalOpFactory layerName comment
@aglinxinyuan aglinxinyuan requested a review from mengw15 July 3, 2026 07:41

@mengw15 mengw15 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@aglinxinyuan aglinxinyuan added this pull request to the merge queue Jul 3, 2026
Merged via the queue into apache:main with commit ad6ea02 Jul 3, 2026
38 of 40 checks passed
@aglinxinyuan aglinxinyuan deleted the test-operator-factory-helpers branch July 3, 2026 09:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants