Skip to content

test(producer): add stream duration parity check to regression harness#1652

Merged
miguel-heygen merged 1 commit into
mainfrom
fix/1648-regression-parity-check
Jun 22, 2026
Merged

test(producer): add stream duration parity check to regression harness#1652
miguel-heygen merged 1 commit into
mainfrom
fix/1648-regression-parity-check

Conversation

@miguel-heygen

@miguel-heygen miguel-heygen commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds a stream duration parity check and a dedicated regression fixture to catch audio/video mux truncation regressions like #1648.

What's included

  1. audio-mux-parity regression fixture — a 10s composition with 3 audio tracks and 3 scene clips. Baseline rendered and stored in LFS. Added to shard-8 in CI.
  2. checkStreamDurationParity() — probes rendered output for video and audio stream durations, fails if they differ by >0.5s. Runs on all non-png-sequence fixtures with audio.
  3. Unit test with fabricated truncated videostreamDurationParity.test.ts creates a 2s-video + 10s-audio mux (simulating the exact render: video stream truncated when composition has <audio> (audio full-length, video stops early) #1648 truncation) and asserts the check catches it.

Baseline probe

codec_type=video  duration=10.000000  nb_frames=300
codec_type=audio  duration=10.026667  nb_frames=470

Drift: 0.027s (well under 0.5s threshold).

How it catches #1648

If someone reintroduces -shortest in the mux command, ffmpeg 6.0 would truncate the video stream to ~1.4s while audio stays at 10s. The parity check sees drift=8.6s > 0.5s and fails the regression.

Test plan

  • Baseline video verified — 10s video, 10s audio, drift 0.027s
  • Regression test passes against baseline (visual, audio, parity all green)
  • Unit test: 3 cases pass (matching durations, truncated video, video-only)
  • Build passes
  • Added to regression shard-8

Closes #1648

@miguel-heygen miguel-heygen force-pushed the fix/1648-regression-parity-check branch 2 times, most recently from 80c7663 to 5c610bc Compare June 22, 2026 21:53

@vanceingalls vanceingalls left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verdict: NIT

Clean regression net for #1648 (ffmpeg -shortest style mux truncation). Threshold (0.5s) sits ~18× over the observed baseline drift (0.027s), the unit test exercises the canonical failure mode with a real ffmpeg-built fixture, and the gate is wired into shard-8 with a small (10s, 3-track-audio) composition. One naming/semantics nit worth flagging, plus a couple of small observations.

What I verified

  • Base: main (standalone — no Graphite stack despite the gt-pasteable-stack URL).
  • Production touch: the -3 is extractMediaMetadata import widened to also pull extractAudioMetadata (1 line), and result.passed gets && parityPassed ANDed in. No prod-behavior change slipped in.
  • Function placement: checkStreamDurationParity (regression-harness.ts:817) is correctly exported; the fail-block inserted into saveFailureDetails closes cleanly before the new // ── Stream Duration Parity ── section. Layout fine.
  • Gating: runs only when !isPngSequence AND parity !== null (null when !meta.hasAudio). Pure-video fixtures and png-sequence fixtures correctly skip. result.passed = … && (parity?.passed ?? true) makes the parity check non-gating for skipped cases. Correct.
  • #1648 catch shape: for a -shortest-truncated mux (video stream cut to 1.4s, audio runs to 180s, container = 180s), videoStreamDurationSeconds = 1.4s and extractAudioMetadata.durationSeconds = 180s (container), so drift ≈ 178.6s, well above 0.5s. Regression net works.
  • Unit test: streamDurationParity.test.ts builds its own truncated-mux fixture via real ffmpeg (not a mock-the-primitive setup) — the test would fail if checkStreamDurationParity were a no-op or if the threshold drifted up. Good against pattern 5.
  • Threshold sanity: baseline drift 0.027s (per PR body), threshold 0.5s, room for AAC priming/PTS rounding without flake. The <0.5s bar is generous but not so tight as to flake.
  • CI: baseline checks green; regression shards + Windows render + Test/Typecheck still in-flight at posting time. Required-set looks healthy so far; no failures.

Findings

  1. NIT — "audio duration" is actually container duration. extractAudioMetadata (engine ffprobe.ts:347-373) reads output.format.duration and returns it as durationSeconds, NOT the audio stream's own duration field. So audioDurationSeconds in this check is really the container's reported duration. For the canonical -shortest shape this is fine (container = max(video stream, audio stream)), and the field name on AudioMetadata is in fact durationSeconds (so the consumer isn't lying about what it asked for) — but the parity framing is technically "video-stream duration vs container duration," not "video-stream duration vs audio-stream duration." Edge case it misses: a future bug that truncates both streams in sync (container also truncates) would slip through with drift = 0. Unlikely shape, but worth either (a) probing the audio stream's own duration field explicitly inside checkStreamDurationParity for true parity, or (b) leaving a code comment that this is intentionally container-duration as a proxy. Not a blocker for #1648 — the named regression is asymmetric truncation, which this nets correctly.

  2. NIT — third unit-test case is decorative. "returns null for video-only files" asserts behavior of the meta.hasAudio gate inside the new function. That's fine for documenting intent, but hasAudio lives in engine code that isn't changing here, and the production callsite already gates on parity !== null, so a regression in hasAudio wouldn't be caught by this test. Low value; harmless.

  3. NIT — meta.json carries minAudioCorrelation: 0.9 / maxAudioLagWindows: 120 on a fixture whose audio is three identical sine-tone bursts. Three back-to-back identical waveforms can correlate well against any same-shape baseline, so the audio-envelope check on this fixture isn't doing much load-bearing work beyond "audio is present and roughly the same length." That's fine — the parity check IS the new signal — but recognize the fixture is leaning on stream parity, not on RMS-envelope content. No action.

Continuity with HF #1615

This is a different regression net from the audioStream.start_time < 0.001 ask in r-1615 (start-time vs end-time invariants), not a supersede. Both are reasonable post-#1648 hardening.

CI state at post

SUCCESS: Format, Lint, Fallow audit, Build, SDK, runtime contract, Studio load smoke, Semantic PR title, player-perf, preview-regression, CodeQL (actions/python). IN_PROGRESS: CodeQL (js-ts), Typecheck, Test, regression-shards 1–8, Windows render + Tests, CLI smoke, Smoke: global install. No failures. Two-minute build window from push; checks should converge shortly. Re-check before stamp.

Review by Via

@james-russo-rames-d-jusso james-russo-rames-d-jusso left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed at: 5c610bcc (HEAD at review time). Layering on Via's review (posted ~minutes ago) — we converged on the same parity-source observation independently; I'm pinning it slightly harder, plus adding a coverage matrix + cross-fixture smoke-run suggestion.

Net: clean, well-scoped regression PR; the harness check + fabricated-truncation unit test together would have caught #1648 deterministically. Nothing blocking.

Verdict matrix

Verdict Item
Reproduces and pins #1648streamDurationParity.test.ts:53-71 (muxWithTruncatedVideo) builds the exact -shortest-style 2s-video + 10s-audio mux, runs through the same checkStreamDurationParity the harness uses, and asserts passed=false. Strong "would have caught it" signal per feedback_defensive_bug_pinning_tests_fp_suppression.
Coverage scope is broad — regression-harness.ts:1064 fires on every non-png-sequence fixture with audio, which means the check rides every in-process / distributed-simulated / lambda-local render that produces a single muxed file. AAC priming-delay drift (0.027s baseline) is well within the 0.5s threshold, so this won't false-positive against the HF#1615-fixed start-time path.
Failure-path diagnostics are useful per feedback_observability_pr_failure_path_coveragesaveFailureDetails writes stream-parity-failure.json with both stream durations + drift, and the logPretty line emits the same numbers on stdout. Oncall can read the drift direction (videoaudio) directly.
🟡 Container-vs-stream duration asymmetry → latent reflexivity risk (also flagged by Via as a NIT — I'd pin it slightly harder). checkStreamDurationParity reads meta.videoStreamDurationSeconds (which is videoStream.duration and falls back to format.duration per ffprobe.ts:317-318) on one side, and audioMeta.durationSeconds (which is output.format.duration per ffprobe.ts:365, not the audio stream's stream.duration) on the other. For the canonical #1648 asymmetric-truncation case the comparison reads as video-stream-vs-container, which catches it cleanly (Via's analysis here is right). The future-bug risk per feedback_parity_test_reflexivity_tautology is: (a) if a future bug truncates BOTH streams in sync the container also truncates and drift = 0 (Via's edge case), and (b) if any future container variant lacks videoStream.duration, videoStreamDurationSeconds falls back to format.duration and both sides become the same value silently. (a) is unlikely; (b) is one mux-config change away. Durable fix: add audioStreamDurationSeconds to AudioMetadata reading audioStream.duration, compare stream-to-stream, and treat the fallback case as "indeterminate / log warning" rather than "passed". Cheap, future-proofs the check.
🟡 Unit test does not exercise the fallback path above — both muxWithMatchingDurations and muxWithTruncatedVideo emit libx264 mp4 with reliable stream.duration. Adding a third case where the input lacks a stream-level video duration (e.g. an fMP4 produced with -movflags +frag_keyframe+empty_moov) would make the silent-pass risk discoverable in CI rather than in prod. Not blocking — the canonical #1648 reproducer is covered.
🟢 MAX_STREAM_DRIFT_SECONDS = 0.5 reads conservative given the 0.027s baseline floor (~18× headroom); leaves room for legitimate per-encoder rounding without masking truncation-class bugs (where drift is seconds-scale). Per feedback_inverse_operation_tolerance_asymmetry, no add/remove inverse exists here — single-direction read, no symmetry constraint to honor.
🟢 Shard-8 addition (regression.yml:82) is the right slot — balances out the existing audio-light fixtures and the 10s composition keeps wall time modest.
🟢 LFS pointer for output/output.mp4 (321645 bytes, sha256 7c7b88b…) is correctly structured. meta.json shape mirrors sibling audio fixtures. Via's nit #3 about the three-identical-sine-tones not making the minAudioCorrelation: 0.9 check load-bearing is fair — audio-mux-parity is genuinely a stream-parity fixture, not an envelope-content fixture. No action needed; recognize the fixture's load-bearing signal IS the new check.

Coverage assessment (encoders × shapes × edges)

Axis Status
h.264 / mp4 (in-process) ✅ via audio-mux-parity fixture
VP9 / WebM (in-process) ✅ shares same checkStreamDurationParity call, no codec-specific branch
Distributed N-chunk concat ✅ goes through same single-mux output path before runTestSuite reaches the check
Lambda-local ✅ same code path
Pad-path (audio extends past video) ✅ exactly the #1648 shape, exercised
Trim-path (video extends past audio) ⚠️ logically covered by absolute drift but no explicit fixture/unit case; if -shortest regression ever inverts (audio truncated to video), the check fails symmetrically — fine to leave implicit
Video-only (no audio) streamDurationParity.test.ts:131-142 covers null return (Via's nit #2 — fair, this exercises the hasAudio gate which lives in engine code unchanged here; cheap intent-documenting test)
png-sequence ✅ skipped at regression-harness.ts:1064 — correct, no mux to probe
MOV / ProRes shard 🟡 hdr-regression/mov-prores fixtures on shard-1 would now exercise the check; if any currently produce a MOV where the video stream omits stream.duration, the fallback-collapse silently passes (see 🟡 #1 above). Worth a smoke-run before merge to confirm none of the existing mov/prores baselines trip the new check.

Class-of-bug memo candidate

The duration-source asymmetry (stream-level on video, container-level on audio) is a generalizable parity-test trap: when a parity check pulls one quantity from a stream-level field and the paired quantity from a container-level field, the stream.duration ?? format.duration fallback can silently collapse them into the same value. Worth pinning as a memo for future harness PRs in this stack — the failure mode is invisible until a new codec/container combo lands.

What I did not verify

  • Did not run the regression locally (LFS pull + ffmpeg). The unit test logic is verifiable from the diff.
  • Did not enumerate every fixture currently on shards 1-8 to confirm none currently produce a stream-duration-less mux (the smoke-run before merge would settle this).

Review by Rames D Jusso

@jrusso1020 jrusso1020 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving at 5c610bcc — Rames D Jusso + Via both reviewed independently, converging on no blockers (3 white-check / 2 yellow-circle / 3 green). The PR adds a real regression net for the #1648 prod bug (pad-path mux duration drift), with real ffmpeg, LFS fixture, sensible CI shard, generous documented threshold.

Two carry-forward observations worth absorbing in follow-up (non-blocking, per both reviewers):

  • format.duration fallback parity collapse — the check reads one side from videoStream.duration and the other via stream.duration ?? format.duration. If a future codec/container lacks stream-level duration, both sides collapse into format.duration and pass tautologically on a truly-broken mux. Via flagged as NIT, Rames flagged as yellow-circle. Latent design wart, not a current-state bug — worth a follow-up to either assert stream-duration presence at probe time, or to make the assertion symmetric.
  • MOV/ProRes coverage — yellow per Rames; worth a shard-1 smoke pre-merge to confirm no current baselines produce a stream-duration-less mux that the new parity check would falsely pass.

— Jerrai

Probes the rendered output for video and audio stream durations after
render and fails the test if they differ by more than 0.5s. Catches
mux-level truncation regressions like the ffmpeg -shortest bug (#1648)
where one stream gets silently cut short.

Runs on all non-png-sequence fixtures with audio — no new meta.json
field needed since this is a universal invariant, not a per-fixture
threshold.
@miguel-heygen miguel-heygen force-pushed the fix/1648-regression-parity-check branch from 5c610bc to e8f8b5e Compare June 22, 2026 22:08
@miguel-heygen

Copy link
Copy Markdown
Collaborator Author

Addressed the review feedback from Via and Rames:

Container-vs-stream duration asymmetry (both flagged this): the parity check now reads the audio stream's own stream.duration instead of format.duration. Added streamDurationSeconds to AudioMetadata (ffprobe.ts) — populated from audioStream.duration, falls back to undefined when the stream field is absent. The parity check prefers streamDurationSeconds over durationSeconds, falling back to container duration only when the stream field isn't available.

This closes the tautology risk where both sides collapse into format.duration on a broken mux.

All tests pass (ffprobe 14/14, parity 3/3, engine 79/79).

@miguel-heygen miguel-heygen merged commit 60d3eeb into main Jun 22, 2026
44 checks passed
@miguel-heygen miguel-heygen deleted the fix/1648-regression-parity-check branch June 22, 2026 22:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

render: video stream truncated when composition has <audio> (audio full-length, video stops early)

4 participants