test(producer): add stream duration parity check to regression harness#1652
Conversation
80c7663 to
5c610bc
Compare
vanceingalls
left a comment
There was a problem hiding this comment.
Verdict: NIT
Clean regression net for #1648 (ffmpeg -shortest style mux truncation). Threshold (0.5s) sits ~18× over the observed baseline drift (0.027s), the unit test exercises the canonical failure mode with a real ffmpeg-built fixture, and the gate is wired into shard-8 with a small (10s, 3-track-audio) composition. One naming/semantics nit worth flagging, plus a couple of small observations.
What I verified
- Base:
main(standalone — no Graphite stack despite thegt-pasteable-stackURL). - Production touch: the
-3isextractMediaMetadataimport widened to also pullextractAudioMetadata(1 line), andresult.passedgets&& parityPassedANDed in. No prod-behavior change slipped in. - Function placement:
checkStreamDurationParity(regression-harness.ts:817) is correctly exported; the fail-block inserted intosaveFailureDetailscloses cleanly before the new// ── Stream Duration Parity ──section. Layout fine. - Gating: runs only when
!isPngSequenceANDparity !== null(null when!meta.hasAudio). Pure-video fixtures and png-sequence fixtures correctly skip.result.passed = … && (parity?.passed ?? true)makes the parity check non-gating for skipped cases. Correct. - #1648 catch shape: for a
-shortest-truncated mux (video stream cut to 1.4s, audio runs to 180s, container = 180s),videoStreamDurationSeconds = 1.4sandextractAudioMetadata.durationSeconds = 180s(container), so drift ≈ 178.6s, well above 0.5s. Regression net works. - Unit test:
streamDurationParity.test.tsbuilds its own truncated-mux fixture via realffmpeg(not a mock-the-primitive setup) — the test would fail ifcheckStreamDurationParitywere a no-op or if the threshold drifted up. Good against pattern 5. - Threshold sanity: baseline drift 0.027s (per PR body), threshold 0.5s, room for AAC priming/PTS rounding without flake. The
<0.5sbar is generous but not so tight as to flake. - CI: baseline checks green; regression shards + Windows render + Test/Typecheck still in-flight at posting time. Required-set looks healthy so far; no failures.
Findings
-
NIT — "audio duration" is actually container duration.
extractAudioMetadata(engineffprobe.ts:347-373) readsoutput.format.durationand returns it asdurationSeconds, NOT the audio stream's owndurationfield. SoaudioDurationSecondsin this check is really the container's reported duration. For the canonical-shortestshape this is fine (container =max(video stream, audio stream)), and the field name onAudioMetadatais in factdurationSeconds(so the consumer isn't lying about what it asked for) — but the parity framing is technically "video-stream duration vs container duration," not "video-stream duration vs audio-stream duration." Edge case it misses: a future bug that truncates both streams in sync (container also truncates) would slip through with drift = 0. Unlikely shape, but worth either (a) probing the audio stream's owndurationfield explicitly insidecheckStreamDurationParityfor true parity, or (b) leaving a code comment that this is intentionally container-duration as a proxy. Not a blocker for #1648 — the named regression is asymmetric truncation, which this nets correctly. -
NIT — third unit-test case is decorative. "returns null for video-only files" asserts behavior of the
meta.hasAudiogate inside the new function. That's fine for documenting intent, buthasAudiolives in engine code that isn't changing here, and the production callsite already gates onparity !== null, so a regression inhasAudiowouldn't be caught by this test. Low value; harmless. -
NIT — meta.json carries
minAudioCorrelation: 0.9/maxAudioLagWindows: 120on a fixture whose audio is three identical sine-tone bursts. Three back-to-back identical waveforms can correlate well against any same-shape baseline, so the audio-envelope check on this fixture isn't doing much load-bearing work beyond "audio is present and roughly the same length." That's fine — the parity check IS the new signal — but recognize the fixture is leaning on stream parity, not on RMS-envelope content. No action.
Continuity with HF #1615
This is a different regression net from the audioStream.start_time < 0.001 ask in r-1615 (start-time vs end-time invariants), not a supersede. Both are reasonable post-#1648 hardening.
CI state at post
SUCCESS: Format, Lint, Fallow audit, Build, SDK, runtime contract, Studio load smoke, Semantic PR title, player-perf, preview-regression, CodeQL (actions/python). IN_PROGRESS: CodeQL (js-ts), Typecheck, Test, regression-shards 1–8, Windows render + Tests, CLI smoke, Smoke: global install. No failures. Two-minute build window from push; checks should converge shortly. Re-check before stamp.
Review by Via
james-russo-rames-d-jusso
left a comment
There was a problem hiding this comment.
Reviewed at: 5c610bcc (HEAD at review time). Layering on Via's review (posted ~minutes ago) — we converged on the same parity-source observation independently; I'm pinning it slightly harder, plus adding a coverage matrix + cross-fixture smoke-run suggestion.
Net: clean, well-scoped regression PR; the harness check + fabricated-truncation unit test together would have caught #1648 deterministically. Nothing blocking.
Verdict matrix
| Verdict | Item |
|---|---|
| ✅ | Reproduces and pins #1648 — streamDurationParity.test.ts:53-71 (muxWithTruncatedVideo) builds the exact -shortest-style 2s-video + 10s-audio mux, runs through the same checkStreamDurationParity the harness uses, and asserts passed=false. Strong "would have caught it" signal per feedback_defensive_bug_pinning_tests_fp_suppression. |
| ✅ | Coverage scope is broad — regression-harness.ts:1064 fires on every non-png-sequence fixture with audio, which means the check rides every in-process / distributed-simulated / lambda-local render that produces a single muxed file. AAC priming-delay drift (0.027s baseline) is well within the 0.5s threshold, so this won't false-positive against the HF#1615-fixed start-time path. |
| ✅ | Failure-path diagnostics are useful per feedback_observability_pr_failure_path_coverage — saveFailureDetails writes stream-parity-failure.json with both stream durations + drift, and the logPretty line emits the same numbers on stdout. Oncall can read the drift direction (videoaudio) directly. |
| 🟡 | Container-vs-stream duration asymmetry → latent reflexivity risk (also flagged by Via as a NIT — I'd pin it slightly harder). checkStreamDurationParity reads meta.videoStreamDurationSeconds (which is videoStream.duration and falls back to format.duration per ffprobe.ts:317-318) on one side, and audioMeta.durationSeconds (which is output.format.duration per ffprobe.ts:365, not the audio stream's stream.duration) on the other. For the canonical #1648 asymmetric-truncation case the comparison reads as video-stream-vs-container, which catches it cleanly (Via's analysis here is right). The future-bug risk per feedback_parity_test_reflexivity_tautology is: (a) if a future bug truncates BOTH streams in sync the container also truncates and drift = 0 (Via's edge case), and (b) if any future container variant lacks videoStream.duration, videoStreamDurationSeconds falls back to format.duration and both sides become the same value silently. (a) is unlikely; (b) is one mux-config change away. Durable fix: add audioStreamDurationSeconds to AudioMetadata reading audioStream.duration, compare stream-to-stream, and treat the fallback case as "indeterminate / log warning" rather than "passed". Cheap, future-proofs the check. |
| 🟡 | Unit test does not exercise the fallback path above — both muxWithMatchingDurations and muxWithTruncatedVideo emit libx264 mp4 with reliable stream.duration. Adding a third case where the input lacks a stream-level video duration (e.g. an fMP4 produced with -movflags +frag_keyframe+empty_moov) would make the silent-pass risk discoverable in CI rather than in prod. Not blocking — the canonical #1648 reproducer is covered. |
| 🟢 | MAX_STREAM_DRIFT_SECONDS = 0.5 reads conservative given the 0.027s baseline floor (~18× headroom); leaves room for legitimate per-encoder rounding without masking truncation-class bugs (where drift is seconds-scale). Per feedback_inverse_operation_tolerance_asymmetry, no add/remove inverse exists here — single-direction read, no symmetry constraint to honor. |
| 🟢 | Shard-8 addition (regression.yml:82) is the right slot — balances out the existing audio-light fixtures and the 10s composition keeps wall time modest. |
| 🟢 | LFS pointer for output/output.mp4 (321645 bytes, sha256 7c7b88b…) is correctly structured. meta.json shape mirrors sibling audio fixtures. Via's nit #3 about the three-identical-sine-tones not making the minAudioCorrelation: 0.9 check load-bearing is fair — audio-mux-parity is genuinely a stream-parity fixture, not an envelope-content fixture. No action needed; recognize the fixture's load-bearing signal IS the new check. |
Coverage assessment (encoders × shapes × edges)
| Axis | Status |
|---|---|
| h.264 / mp4 (in-process) | ✅ via audio-mux-parity fixture |
| VP9 / WebM (in-process) | ✅ shares same checkStreamDurationParity call, no codec-specific branch |
| Distributed N-chunk concat | ✅ goes through same single-mux output path before runTestSuite reaches the check |
| Lambda-local | ✅ same code path |
| Pad-path (audio extends past video) | ✅ exactly the #1648 shape, exercised |
| Trim-path (video extends past audio) | -shortest regression ever inverts (audio truncated to video), the check fails symmetrically — fine to leave implicit |
| Video-only (no audio) | ✅ streamDurationParity.test.ts:131-142 covers null return (Via's nit #2 — fair, this exercises the hasAudio gate which lives in engine code unchanged here; cheap intent-documenting test) |
| png-sequence | ✅ skipped at regression-harness.ts:1064 — correct, no mux to probe |
| MOV / ProRes shard | 🟡 hdr-regression/mov-prores fixtures on shard-1 would now exercise the check; if any currently produce a MOV where the video stream omits stream.duration, the fallback-collapse silently passes (see 🟡 #1 above). Worth a smoke-run before merge to confirm none of the existing mov/prores baselines trip the new check. |
Class-of-bug memo candidate
The duration-source asymmetry (stream-level on video, container-level on audio) is a generalizable parity-test trap: when a parity check pulls one quantity from a stream-level field and the paired quantity from a container-level field, the stream.duration ?? format.duration fallback can silently collapse them into the same value. Worth pinning as a memo for future harness PRs in this stack — the failure mode is invisible until a new codec/container combo lands.
What I did not verify
- Did not run the regression locally (LFS pull + ffmpeg). The unit test logic is verifiable from the diff.
- Did not enumerate every fixture currently on shards 1-8 to confirm none currently produce a stream-duration-less mux (the smoke-run before merge would settle this).
Review by Rames D Jusso
jrusso1020
left a comment
There was a problem hiding this comment.
Approving at 5c610bcc — Rames D Jusso + Via both reviewed independently, converging on no blockers (3 white-check / 2 yellow-circle / 3 green). The PR adds a real regression net for the #1648 prod bug (pad-path mux duration drift), with real ffmpeg, LFS fixture, sensible CI shard, generous documented threshold.
Two carry-forward observations worth absorbing in follow-up (non-blocking, per both reviewers):
format.durationfallback parity collapse — the check reads one side fromvideoStream.durationand the other viastream.duration ?? format.duration. If a future codec/container lacks stream-level duration, both sides collapse intoformat.durationand pass tautologically on a truly-broken mux. Via flagged as NIT, Rames flagged as yellow-circle. Latent design wart, not a current-state bug — worth a follow-up to either assert stream-duration presence at probe time, or to make the assertion symmetric.- MOV/ProRes coverage — yellow per Rames; worth a shard-1 smoke pre-merge to confirm no current baselines produce a stream-duration-less mux that the new parity check would falsely pass.
— Jerrai
Probes the rendered output for video and audio stream durations after render and fails the test if they differ by more than 0.5s. Catches mux-level truncation regressions like the ffmpeg -shortest bug (#1648) where one stream gets silently cut short. Runs on all non-png-sequence fixtures with audio — no new meta.json field needed since this is a universal invariant, not a per-fixture threshold.
5c610bc to
e8f8b5e
Compare
|
Addressed the review feedback from Via and Rames: Container-vs-stream duration asymmetry (both flagged this): the parity check now reads the audio stream's own This closes the tautology risk where both sides collapse into All tests pass (ffprobe 14/14, parity 3/3, engine 79/79). |
Summary
Adds a stream duration parity check and a dedicated regression fixture to catch audio/video mux truncation regressions like #1648.
What's included
audio-mux-parityregression fixture — a 10s composition with 3 audio tracks and 3 scene clips. Baseline rendered and stored in LFS. Added to shard-8 in CI.checkStreamDurationParity()— probes rendered output for video and audio stream durations, fails if they differ by >0.5s. Runs on all non-png-sequence fixtures with audio.streamDurationParity.test.tscreates a 2s-video + 10s-audio mux (simulating the exact render: video stream truncated when composition has <audio> (audio full-length, video stops early) #1648 truncation) and asserts the check catches it.Baseline probe
Drift: 0.027s (well under 0.5s threshold).
How it catches #1648
If someone reintroduces
-shortestin the mux command, ffmpeg 6.0 would truncate the video stream to ~1.4s while audio stays at 10s. The parity check seesdrift=8.6s > 0.5sand fails the regression.Test plan
Closes #1648