test(workflow-operator): fill uncovered branches in filter, reader, and dual-port UDF classes#6057
Conversation
…nd dual-port UDF classes Targets Codecov-missed lines: FilterPredicate (26), BufferedBlockReader (51, previously 0%), DualInputPortsPythonUDFOpDescV2 (27). Extends the existing FilterPredicateSpec and adds BufferedBlockReaderSpec and DualInputPortsPythonUDFOpDescV2Spec.
Automated Reviewer SuggestionsBased on the
|
There was a problem hiding this comment.
Pull request overview
This PR increases unit-test coverage in common/workflow-operator by adding/expanding specs to exercise previously uncovered branches in filter evaluation, buffered line parsing, and dual-input Python UDF physical-op/schema logic, without changing production code.
Changes:
- Extend
FilterPredicateSpecto cover remainingevaluatebranches across comparison types and attribute types, plusequalsedge cases. - Add
BufferedBlockReaderSpecto cover delimiter splitting, EOF flushing, CR/CRLF handling artifacts, buffer-boundary behavior, kept-field filtering,hasNext, andclose. - Add
DualInputPortsPythonUDFOpDescV2Specto covergetPhysicalOpwiring/preconditions/env-name resolution and schema propagation behaviors.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| common/workflow-operator/src/test/scala/org/apache/texera/amber/operator/udf/python/DualInputPortsPythonUDFOpDescV2Spec.scala | New spec covering getPhysicalOp validation/wiring and schema propagation (incl. duplicate-column rejection) for the dual-input Python UDF operator. |
| common/workflow-operator/src/test/scala/org/apache/texera/amber/operator/source/BufferedBlockReaderSpec.scala | New spec exercising BufferedBlockReader parsing and boundary conditions, including a chunked-stream stub for buffer-edge paths. |
| common/workflow-operator/src/test/scala/org/apache/texera/amber/operator/filter/FilterPredicateSpec.scala | Expanded spec covering remaining FilterPredicate.evaluate branches across operators/types and equals edge cases. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #6057 +/- ##
============================================
+ Coverage 56.37% 56.62% +0.25%
- Complexity 2989 3035 +46
============================================
Files 1129 1129
Lines 43794 43802 +8
Branches 4743 4743
============================================
+ Hits 24689 24804 +115
+ Misses 17654 17547 -107
Partials 1451 1451
*This pull request uses carry forward flags. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
| config | throughput | MB/s | latency | max Δ latest / 7d | |
|---|---|---|---|---|---|
| 🔴 | bs=10 sw=10 sl=64 | 372 | 0.227 | 27,057/35,335/35,335 us | 🔴 +21.8% / 🔴 +139.0% |
| ⚪ | bs=100 sw=10 sl=64 | 790 | 0.482 | 123,543/147,472/147,472 us | ⚪ within ±5% / 🔴 +38.0% |
| ⚪ | bs=1000 sw=10 sl=64 | 930 | 0.568 | 1,067,832/1,109,928/1,109,928 us | ⚪ within ±5% / 🔴 -9.1% |
Baseline details
Latest main 878eb8a from same runner
| config | metric | PR | latest main | 7d avg | Δ latest | Δ 7d |
|---|---|---|---|---|---|---|
| bs=10 sw=10 sl=64 | throughput | 372 tuples/sec | 403 tuples/sec | 786.27 tuples/sec | -7.7% | -52.7% |
| bs=10 sw=10 sl=64 | MB/s | 0.227 MB/s | 0.246 MB/s | 0.48 MB/s | -7.7% | -52.7% |
| bs=10 sw=10 sl=64 | p50 | 27,057 us | 22,211 us | 12,495 us | +21.8% | +116.5% |
| bs=10 sw=10 sl=64 | p95 | 35,335 us | 39,849 us | 14,784 us | -11.3% | +139.0% |
| bs=10 sw=10 sl=64 | p99 | 35,335 us | 39,849 us | 18,468 us | -11.3% | +91.3% |
| bs=100 sw=10 sl=64 | throughput | 790 tuples/sec | 819 tuples/sec | 991.49 tuples/sec | -3.5% | -20.3% |
| bs=100 sw=10 sl=64 | MB/s | 0.482 MB/s | 0.5 MB/s | 0.605 MB/s | -3.6% | -20.4% |
| bs=100 sw=10 sl=64 | p50 | 123,543 us | 120,624 us | 100,929 us | +2.4% | +22.4% |
| bs=100 sw=10 sl=64 | p95 | 147,472 us | 146,642 us | 106,894 us | +0.6% | +38.0% |
| bs=100 sw=10 sl=64 | p99 | 147,472 us | 146,642 us | 114,085 us | +0.6% | +29.3% |
| bs=1000 sw=10 sl=64 | throughput | 930 tuples/sec | 943 tuples/sec | 1,023 tuples/sec | -1.4% | -9.1% |
| bs=1000 sw=10 sl=64 | MB/s | 0.568 MB/s | 0.576 MB/s | 0.624 MB/s | -1.4% | -9.0% |
| bs=1000 sw=10 sl=64 | p50 | 1,067,832 us | 1,057,576 us | 983,835 us | +1.0% | +8.5% |
| bs=1000 sw=10 sl=64 | p95 | 1,109,928 us | 1,100,863 us | 1,023,777 us | +0.8% | +8.4% |
| bs=1000 sw=10 sl=64 | p99 | 1,109,928 us | 1,100,863 us | 1,053,883 us | +0.8% | +5.3% |
Raw CSV
config_idx,batch_size,schema_width,string_len,num_batches,total_ms,total_tuples,total_bytes,tuples_per_sec,mb_per_sec,lat_p50_us,lat_p95_us,lat_p99_us
0,10,10,64,20,538.04,200,128000,372,0.227,27057.19,35334.64,35334.64
1,100,10,64,20,2532.40,2000,1280000,790,0.482,123543.27,147472.25,147472.25
2,1000,10,64,20,21494.73,20000,12800000,930,0.568,1067831.60,1109927.91,1109927.91- bound ChunkedInputStream.read by the buffer length, carrying the chunk remainder into the next call per the InputStream contract - drop the ambiguous "(default)" qualifier from the schema-propagation test name (the annotation defaultValue disagrees with the field initializer)
What changes were proposed in this PR?
Fill uncovered code paths in three
workflow-operatorclasses, selected from the Codecov report. No production-code changes.FilterPredicate.javaevaluatebranches: GE/LT operators, null-field short-circuit, IS_NULL both ways, boolean (case-insensitive), double, long, and timestamp evaluators, the numeric-string fast path, and theequalsidentity/null/other-type branchesBufferedBlockReader.javahasNextblock-boundary semantics,closeDualInputPortsPythonUDFOpDescV2.scalagetPhysicalOp: worker precondition, env-name resolution (default/blank-reject/trim), single vs parallel wiring, unknown-partition derivation, and the port-1-drives-output schema propagation incl. the duplicate-column rejectionExtends the existing
FilterPredicateSpecand addsBufferedBlockReaderSpec(with a chunked-stream stub to force buffer-boundary paths) andDualInputPortsPythonUDFOpDescV2Spec.Any related issues, documentation, discussions?
Follow-up to the review feedback on #6043: prioritize tests that fill uncovered code paths.
How was this PR tested?
sbt "WorkflowOperator/testOnly *FilterPredicateSpec *BufferedBlockReaderSpec *DualInputPortsPythonUDFOpDescV2Spec"— 39 tests, all greensbt "WorkflowOperator/Test/scalafmtCheck"andsbt "WorkflowOperator/scalafixAll --check"— cleanWas this PR authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Opus 4.8 [1M context])