test(producer): add stream duration parity check to regression harness by miguel-heygen · Pull Request #1652 · heygen-com/hyperframes

miguel-heygen · 2026-06-22T21:26:09Z

Summary

Adds a stream duration parity check and a dedicated regression fixture to catch audio/video mux truncation regressions like #1648.

What's included

audio-mux-parity regression fixture — a 10s composition with 3 audio tracks and 3 scene clips. Baseline rendered and stored in LFS. Added to shard-8 in CI.
checkStreamDurationParity() — probes rendered output for video and audio stream durations, fails if they differ by >0.5s. Runs on all non-png-sequence fixtures with audio.
Unit test with fabricated truncated video — streamDurationParity.test.ts creates a 2s-video + 10s-audio mux (simulating the exact render: video stream truncated when composition has <audio> (audio full-length, video stops early) #1648 truncation) and asserts the check catches it.

Baseline probe

codec_type=video  duration=10.000000  nb_frames=300
codec_type=audio  duration=10.026667  nb_frames=470

Drift: 0.027s (well under 0.5s threshold).

How it catches #1648

If someone reintroduces -shortest in the mux command, ffmpeg 6.0 would truncate the video stream to ~1.4s while audio stays at 10s. The parity check sees drift=8.6s > 0.5s and fails the regression.

Test plan

Baseline video verified — 10s video, 10s audio, drift 0.027s
Regression test passes against baseline (visual, audio, parity all green)
Unit test: 3 cases pass (matching durations, truncated video, video-only)
Build passes
Added to regression shard-8

Closes #1648

vanceingalls

Verdict: NIT

Clean regression net for #1648 (ffmpeg -shortest style mux truncation). Threshold (0.5s) sits ~18× over the observed baseline drift (0.027s), the unit test exercises the canonical failure mode with a real ffmpeg-built fixture, and the gate is wired into shard-8 with a small (10s, 3-track-audio) composition. One naming/semantics nit worth flagging, plus a couple of small observations.

What I verified

Base: main (standalone — no Graphite stack despite the gt-pasteable-stack URL).
Production touch: the -3 is extractMediaMetadata import widened to also pull extractAudioMetadata (1 line), and result.passed gets && parityPassed ANDed in. No prod-behavior change slipped in.
Function placement: checkStreamDurationParity (regression-harness.ts:817) is correctly exported; the fail-block inserted into saveFailureDetails closes cleanly before the new // ── Stream Duration Parity ── section. Layout fine.
Gating: runs only when !isPngSequence AND parity !== null (null when !meta.hasAudio). Pure-video fixtures and png-sequence fixtures correctly skip. result.passed = … && (parity?.passed ?? true) makes the parity check non-gating for skipped cases. Correct.
#1648 catch shape: for a -shortest-truncated mux (video stream cut to 1.4s, audio runs to 180s, container = 180s), videoStreamDurationSeconds = 1.4s and extractAudioMetadata.durationSeconds = 180s (container), so drift ≈ 178.6s, well above 0.5s. Regression net works.
Unit test: streamDurationParity.test.ts builds its own truncated-mux fixture via real ffmpeg (not a mock-the-primitive setup) — the test would fail if checkStreamDurationParity were a no-op or if the threshold drifted up. Good against pattern 5.
Threshold sanity: baseline drift 0.027s (per PR body), threshold 0.5s, room for AAC priming/PTS rounding without flake. The <0.5s bar is generous but not so tight as to flake.
CI: baseline checks green; regression shards + Windows render + Test/Typecheck still in-flight at posting time. Required-set looks healthy so far; no failures.

Findings

NIT — "audio duration" is actually container duration. extractAudioMetadata (engine ffprobe.ts:347-373) reads output.format.duration and returns it as durationSeconds, NOT the audio stream's own duration field. So audioDurationSeconds in this check is really the container's reported duration. For the canonical -shortest shape this is fine (container = max(video stream, audio stream)), and the field name on AudioMetadata is in fact durationSeconds (so the consumer isn't lying about what it asked for) — but the parity framing is technically "video-stream duration vs container duration," not "video-stream duration vs audio-stream duration." Edge case it misses: a future bug that truncates both streams in sync (container also truncates) would slip through with drift = 0. Unlikely shape, but worth either (a) probing the audio stream's own duration field explicitly inside checkStreamDurationParity for true parity, or (b) leaving a code comment that this is intentionally container-duration as a proxy. Not a blocker for #1648 — the named regression is asymmetric truncation, which this nets correctly.
NIT — third unit-test case is decorative. "returns null for video-only files" asserts behavior of the meta.hasAudio gate inside the new function. That's fine for documenting intent, but hasAudio lives in engine code that isn't changing here, and the production callsite already gates on parity !== null, so a regression in hasAudio wouldn't be caught by this test. Low value; harmless.
NIT — meta.json carries minAudioCorrelation: 0.9 / maxAudioLagWindows: 120 on a fixture whose audio is three identical sine-tone bursts. Three back-to-back identical waveforms can correlate well against any same-shape baseline, so the audio-envelope check on this fixture isn't doing much load-bearing work beyond "audio is present and roughly the same length." That's fine — the parity check IS the new signal — but recognize the fixture is leaning on stream parity, not on RMS-envelope content. No action.

Continuity with HF #1615

This is a different regression net from the audioStream.start_time < 0.001 ask in r-1615 (start-time vs end-time invariants), not a supersede. Both are reasonable post-#1648 hardening.

CI state at post

SUCCESS: Format, Lint, Fallow audit, Build, SDK, runtime contract, Studio load smoke, Semantic PR title, player-perf, preview-regression, CodeQL (actions/python). IN_PROGRESS: CodeQL (js-ts), Typecheck, Test, regression-shards 1–8, Windows render + Tests, CLI smoke, Smoke: global install. No failures. Two-minute build window from push; checks should converge shortly. Re-check before stamp.

Review by Via

james-russo-rames-d-jusso

Reviewed at: 5c610bcc (HEAD at review time). Layering on Via's review (posted ~minutes ago) — we converged on the same parity-source observation independently; I'm pinning it slightly harder, plus adding a coverage matrix + cross-fixture smoke-run suggestion.

Net: clean, well-scoped regression PR; the harness check + fabricated-truncation unit test together would have caught #1648 deterministically. Nothing blocking.

Verdict matrix

Verdict	Item
✅	Reproduces and pins #1648 — `streamDurationParity.test.ts:53-71` (`muxWithTruncatedVideo`) builds the exact `-shortest`-style 2s-video + 10s-audio mux, runs through the same `checkStreamDurationParity` the harness uses, and asserts `passed=false`. Strong "would have caught it" signal per `feedback_defensive_bug_pinning_tests_fp_suppression`.
✅	Coverage scope is broad — `regression-harness.ts:1064` fires on every non-`png-sequence` fixture with audio, which means the check rides every in-process / `distributed-simulated` / `lambda-local` render that produces a single muxed file. AAC priming-delay drift (0.027s baseline) is well within the 0.5s threshold, so this won't false-positive against the HF#1615-fixed start-time path.
✅	Failure-path diagnostics are useful per `feedback_observability_pr_failure_path_coverage` — `saveFailureDetails` writes `stream-parity-failure.json` with both stream durations + drift, and the `logPretty` line emits the same numbers on stdout. Oncall can read the drift direction (videoaudio) directly.
🟡	Container-vs-stream duration asymmetry → latent reflexivity risk (also flagged by Via as a NIT — I'd pin it slightly harder). `checkStreamDurationParity` reads `meta.videoStreamDurationSeconds` (which is `videoStream.duration` and falls back to `format.duration` per `ffprobe.ts:317-318`) on one side, and `audioMeta.durationSeconds` (which is `output.format.duration` per `ffprobe.ts:365`, not the audio stream's `stream.duration`) on the other. For the canonical #1648 asymmetric-truncation case the comparison reads as video-stream-vs-container, which catches it cleanly (Via's analysis here is right). The future-bug risk per `feedback_parity_test_reflexivity_tautology` is: (a) if a future bug truncates BOTH streams in sync the container also truncates and drift = 0 (Via's edge case), and (b) if any future container variant lacks `videoStream.duration`, `videoStreamDurationSeconds` falls back to `format.duration` and both sides become the same value silently. (a) is unlikely; (b) is one mux-config change away. Durable fix: add `audioStreamDurationSeconds` to `AudioMetadata` reading `audioStream.duration`, compare stream-to-stream, and treat the fallback case as "indeterminate / log warning" rather than "passed". Cheap, future-proofs the check.
🟡	Unit test does not exercise the fallback path above — both `muxWithMatchingDurations` and `muxWithTruncatedVideo` emit libx264 mp4 with reliable `stream.duration`. Adding a third case where the input lacks a stream-level video duration (e.g. an fMP4 produced with `-movflags +frag_keyframe+empty_moov`) would make the silent-pass risk discoverable in CI rather than in prod. Not blocking — the canonical #1648 reproducer is covered.
🟢	`MAX_STREAM_DRIFT_SECONDS = 0.5` reads conservative given the 0.027s baseline floor (~18× headroom); leaves room for legitimate per-encoder rounding without masking truncation-class bugs (where drift is seconds-scale). Per `feedback_inverse_operation_tolerance_asymmetry`, no add/remove inverse exists here — single-direction read, no symmetry constraint to honor.
🟢	Shard-8 addition (`regression.yml:82`) is the right slot — balances out the existing audio-light fixtures and the 10s composition keeps wall time modest.
🟢	LFS pointer for `output/output.mp4` (321645 bytes, sha256 `7c7b88b…`) is correctly structured. `meta.json` shape mirrors sibling audio fixtures. Via's nit #3 about the three-identical-sine-tones not making the `minAudioCorrelation: 0.9` check load-bearing is fair — `audio-mux-parity` is genuinely a stream-parity fixture, not an envelope-content fixture. No action needed; recognize the fixture's load-bearing signal IS the new check.

Coverage assessment (encoders × shapes × edges)

Axis	Status
h.264 / mp4 (in-process)	✅ via `audio-mux-parity` fixture
VP9 / WebM (in-process)	✅ shares same `checkStreamDurationParity` call, no codec-specific branch
Distributed N-chunk concat	✅ goes through same single-mux output path before `runTestSuite` reaches the check
Lambda-local	✅ same code path
Pad-path (audio extends past video)	✅ exactly the #1648 shape, exercised
Trim-path (video extends past audio)	⚠️ logically covered by absolute drift but no explicit fixture/unit case; if `-shortest` regression ever inverts (audio truncated to video), the check fails symmetrically — fine to leave implicit
Video-only (no audio)	✅ `streamDurationParity.test.ts:131-142` covers `null` return (Via's nit #2 — fair, this exercises the `hasAudio` gate which lives in engine code unchanged here; cheap intent-documenting test)
png-sequence	✅ skipped at `regression-harness.ts:1064` — correct, no mux to probe
MOV / ProRes shard	🟡 `hdr-regression`/`mov-prores` fixtures on shard-1 would now exercise the check; if any currently produce a MOV where the video stream omits `stream.duration`, the fallback-collapse silently passes (see 🟡 #1 above). Worth a smoke-run before merge to confirm none of the existing mov/prores baselines trip the new check.

Class-of-bug memo candidate

The duration-source asymmetry (stream-level on video, container-level on audio) is a generalizable parity-test trap: when a parity check pulls one quantity from a stream-level field and the paired quantity from a container-level field, the stream.duration ?? format.duration fallback can silently collapse them into the same value. Worth pinning as a memo for future harness PRs in this stack — the failure mode is invisible until a new codec/container combo lands.

What I did not verify

Did not run the regression locally (LFS pull + ffmpeg). The unit test logic is verifiable from the diff.
Did not enumerate every fixture currently on shards 1-8 to confirm none currently produce a stream-duration-less mux (the smoke-run before merge would settle this).

Review by Rames D Jusso

jrusso1020

Approving at 5c610bcc — Rames D Jusso + Via both reviewed independently, converging on no blockers (3 white-check / 2 yellow-circle / 3 green). The PR adds a real regression net for the #1648 prod bug (pad-path mux duration drift), with real ffmpeg, LFS fixture, sensible CI shard, generous documented threshold.

Two carry-forward observations worth absorbing in follow-up (non-blocking, per both reviewers):

format.duration fallback parity collapse — the check reads one side from videoStream.duration and the other via stream.duration ?? format.duration. If a future codec/container lacks stream-level duration, both sides collapse into format.duration and pass tautologically on a truly-broken mux. Via flagged as NIT, Rames flagged as yellow-circle. Latent design wart, not a current-state bug — worth a follow-up to either assert stream-duration presence at probe time, or to make the assertion symmetric.
MOV/ProRes coverage — yellow per Rames; worth a shard-1 smoke pre-merge to confirm no current baselines produce a stream-duration-less mux that the new parity check would falsely pass.

— Jerrai

Probes the rendered output for video and audio stream durations after render and fails the test if they differ by more than 0.5s. Catches mux-level truncation regressions like the ffmpeg -shortest bug (#1648) where one stream gets silently cut short. Runs on all non-png-sequence fixtures with audio — no new meta.json field needed since this is a universal invariant, not a per-fixture threshold.

miguel-heygen · 2026-06-22T22:08:56Z

Addressed the review feedback from Via and Rames:

Container-vs-stream duration asymmetry (both flagged this): the parity check now reads the audio stream's own stream.duration instead of format.duration. Added streamDurationSeconds to AudioMetadata (ffprobe.ts) — populated from audioStream.duration, falls back to undefined when the stream field is absent. The parity check prefers streamDurationSeconds over durationSeconds, falling back to container duration only when the stream field isn't available.

This closes the tautology risk where both sides collapse into format.duration on a broken mux.

All tests pass (ffprobe 14/14, parity 3/3, engine 79/79).

miguel-heygen force-pushed the fix/1648-regression-parity-check branch 2 times, most recently from 80c7663 to 5c610bc Compare June 22, 2026 21:53

vanceingalls reviewed Jun 22, 2026

View reviewed changes

james-russo-rames-d-jusso reviewed Jun 22, 2026

View reviewed changes

jrusso1020 approved these changes Jun 22, 2026

View reviewed changes

miguel-heygen force-pushed the fix/1648-regression-parity-check branch from 5c610bc to e8f8b5e Compare June 22, 2026 22:08

miguel-heygen merged commit 60d3eeb into main Jun 22, 2026
44 checks passed

miguel-heygen deleted the fix/1648-regression-parity-check branch June 22, 2026 22:33

vanceingalls mentioned this pull request Jun 23, 2026

fix(producer): derive duration from sub-composition timing #1680

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test(producer): add stream duration parity check to regression harness#1652

test(producer): add stream duration parity check to regression harness#1652
miguel-heygen merged 1 commit into
mainfrom
fix/1648-regression-parity-check

miguel-heygen commented Jun 22, 2026 •

edited

Loading

Uh oh!

vanceingalls left a comment

Uh oh!

james-russo-rames-d-jusso left a comment

Uh oh!

jrusso1020 left a comment

Uh oh!

miguel-heygen commented Jun 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

miguel-heygen commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's included

Baseline probe

How it catches #1648

Test plan

Uh oh!

vanceingalls left a comment

Choose a reason for hiding this comment

Verdict: NIT

What I verified

Findings

Continuity with HF #1615

CI state at post

Uh oh!

james-russo-rames-d-jusso left a comment

Choose a reason for hiding this comment

Verdict matrix

Coverage assessment (encoders × shapes × edges)

Class-of-bug memo candidate

What I did not verify

Uh oh!

jrusso1020 left a comment

Choose a reason for hiding this comment

Uh oh!

miguel-heygen commented Jun 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

miguel-heygen commented Jun 22, 2026 •

edited

Loading