Skip to content

fix(engine): defend macOS regular Chrome screenshots#1703

Merged
miguel-heygen merged 2 commits into
mainfrom
codex/fix-system-chrome-screenshot-band
Jun 24, 2026
Merged

fix(engine): defend macOS regular Chrome screenshots#1703
miguel-heygen merged 2 commits into
mainfrom
codex/fix-system-chrome-screenshot-band

Conversation

@miguel-heygen

@miguel-heygen miguel-heygen commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

What

Fixes macOS regular-Chrome renders where the bottom of the frame can show the page background instead of composition content.

Why

Affected macOS/Chrome builds can return a viewport-bound screenshot surface shorter than the requested clip. The fallback page background then leaks into the bottom of the captured frame.

How

Regular Chrome on macOS now defaults to the beyond-viewport screenshot path. Headless Chrome and explicit caller overrides keep the viewport-bound path. Producer observability now reports the engine-resolved value, and a producer MP4 fixture covers bottom-edge output against a red background leak.

This intentionally trades macOS regular-Chrome DOM-only render speed for correctness; Linux/headless-shell paths keep the faster viewport-bound capture.

Fixes #1699

Test plan

  • Unit tests added/updated
  • Manual testing performed
  • Documentation updated (not applicable)

Commands run:

  • bun run --cwd packages/engine test src/services/screenshotService.test.ts src/services/frameCapture.captureOptions.test.ts
  • bun run --cwd packages/engine typecheck
  • bun run --cwd packages/producer typecheck
  • bun run --cwd packages/cli dev lint ../../packages/producer/tests/chrome-screenshot-bottom-edge/src
  • bun run --cwd packages/cli dev validate ../../packages/producer/tests/chrome-screenshot-bottom-edge/src --timeout 10000
  • bunx oxlint packages/engine/src/services/frameCapture.ts packages/engine/src/services/frameCapture.captureOptions.test.ts packages/engine/src/services/screenshotService.ts packages/engine/src/services/screenshotService.test.ts packages/engine/src/types.ts packages/producer/src/services/renderOrchestrator.ts packages/producer/src/services/render/stages/captureStage.ts packages/producer/src/services/render/stages/captureStreamingStage.ts packages/producer/src/services/distributed/renderChunk.ts
  • bunx oxfmt --check packages/engine/src/services/frameCapture.ts packages/engine/src/services/frameCapture.captureOptions.test.ts packages/engine/src/services/screenshotService.ts packages/engine/src/services/screenshotService.test.ts packages/engine/src/types.ts packages/producer/src/services/renderOrchestrator.ts packages/producer/src/services/render/stages/captureStage.ts packages/producer/src/services/render/stages/captureStreamingStage.ts packages/producer/src/services/distributed/renderChunk.ts
  • Linux x86_64 Docker devbox: docker build -f Dockerfile.test -t hyperframes-producer:test .
  • Linux x86_64 Docker devbox: docker run --rm --security-opt seccomp=unconfined --shm-size=2g -v ./packages/producer/tests:/app/packages/producer/tests hyperframes-producer:test --update chrome-screenshot-bottom-edge
  • Linux x86_64 Docker devbox: docker run --rm --security-opt seccomp=unconfined --shm-size=2g -v ./packages/producer/tests:/app/packages/producer/tests hyperframes-producer:test chrome-screenshot-bottom-edge

@miguel-heygen miguel-heygen force-pushed the codex/fix-system-chrome-screenshot-band branch from 636803b to cfabb8b Compare June 24, 2026 21:41
@miguel-heygen miguel-heygen force-pushed the codex/fix-system-chrome-screenshot-band branch from cfabb8b to 20f9b53 Compare June 24, 2026 22:01

@vanceingalls vanceingalls left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #1703 ports issue #1699's "bottom-band leak on system Chrome" RCA into a small, surgical engine-side default: when the resolved browser is regular macOS Chrome (Chrome/..., not HeadlessChrome/...), createCaptureSession flips the captureBeyondViewport default from false to true, so Page.captureScreenshot no longer backfills a short compositor surface with the page background. The production-code delta is ~28 lines across 5 files; the other ~715 lines are the new chrome-screenshot-bottom-edge producer regression fixture (compiled.html + MP4 golden + meta.json + index.html). Detection is platform + UA-prefix based, opt-in for HeadlessChrome (keeps the fast viewport-bound path) and respects any explicit caller override.

Verdict: LGTM

Findings (numbered, with severity tag):

  1. nitpackages/engine/src/services/screenshotService.ts:30browserVersion.startsWith("Chrome/") is the right cohort (Puppeteer's browser.version() returns "Chrome/149.0.7827.155" for regular Chrome and "HeadlessChrome/<x>" for headless-shell, so the prefix check is unambiguous). One genuinely-future case worth a comment-line: if Chrome/Chromium ever changes the user-agent of the macOS regular build (or someone routes chromium here), the prefix could go stale — but that's also a clear monitor failure (the new regression fixture would catch it on macOS hosts) and not actionable today. Mention only because the issue reporter did ask whether doctor could warn — that's a future enhancement, not part of this PR's scope.

  2. nitpackages/producer/src/services/renderOrchestrator.ts:1216 and packages/producer/src/services/distributed/renderChunk.ts:509 — the conditional-spread idiom (...(composition.videos.length > 0 ? { captureBeyondViewport: true } : {})) is the load-bearing change here: it lets the engine-side default apply when no videos are present (previously false was passed unconditionally). Correct, and matches exactOptionalPropertyTypes semantics. The updateCaptureObservability({ captureBeyondViewport: captureOptions.captureBeyondViewport ?? false }) line right below (renderOrchestrator.ts:1218) will now report false in observability for any producer render where the engine ends up flipping the default to true (i.e. local macOS producer renders). Producer almost exclusively runs on Linux servers, so the observability gap is minor — but if you want the field to mean "what the engine actually does," consider piping the resolved value back from createCaptureSession or just deleting the field from the producer observability schema. Author discretion.

Verified:

  • Cohort detection: shouldDefaultCaptureBeyondViewport("Chrome/149...", "darwin") === true, ("HeadlessChrome/148...", "darwin") === false, ("Chrome/149...", "linux") === false — all three lock in by the unit tests at screenshotService.test.ts:124-138. The headless-shell on Linux producer path keeps the fast viewport-bound capture; the headless-shell on macOS dev path also keeps it.
  • vs #1618 (page-side compositor handshake): the handshake at frameCapture.ts:1291 is gated by session.captureMode !== "beginframe". On macOS, captureMode resolves to "screenshot" at frameCapture.ts:357 (the isLinux && headlessShell gate fails), so the handshake fires on the macOS path before and after this PR. The new beyond-viewport default does not bypass it.
  • vs #1656 (captureForceScreenshot flip): that gate flips preMode to "screenshot" on producer renders. Same conclusion — macOS already lands on "screenshot" mode, so the interaction is a no-op for the reporter's cohort. The flag continues to do its work on Linux.
  • vs #1670 (fast screenshot path for viewport captures): the headless-shell path (Linux + headless-shell) keeps captureBeyondViewport=false via the HeadlessChrome prefix check, so the fast viewport-bound path is preserved for the perf-sensitive cohort. macOS regular Chrome is dev-only; the perf regression there is acceptable.
  • vs #1702 (in-flow clip anchoring): orthogonal layer (runtime layout vs engine capture). No interaction; verified by file-path disjointness.
  • Sticky session.options propagation: pageScreenshotCapture is called with session.options (line 1523, 1704), which is now the sessionOptions with the resolved default. The per-frame call site at pageScreenshotCapture reads options.captureBeyondViewport ?? false (screenshotService.ts:152), so the macOS render flows the true default all the way to the CDP call.
  • No silent black-frame / catch-its-own-throw: defense is a single CDP flag change, no try/catch wrapper, no retry loop, no fallback that could mask a real failure. If Page.captureScreenshot({captureBeyondViewport:true}) throws, the existing screenshot-path error handling at frameCapture.ts:1730+ raises loudly.
  • Cohort scope: not "all macOS", not "all Chrome", not "all non-headless-shell" — exactly darwin and Chrome/ prefix. False-positive risk: another browser that reports Chrome/... on darwin (e.g. Edge, Brave, Opera) — those also exhibit the same Blink screenshot-compositor behavior, so opting them into the safer path is a feature, not a bug. False-negative risk: a future Chrome with a different version-string shape — unlikely; Chromium's browser.version() format is stable.
  • Regression fixture: chrome-screenshot-bottom-edge paints a red page background under a green opaque foreground; the goldens are reproducible via the --update Docker incantation already documented in the PR description test plan. minPsnr: 30, maxFrameFailures: 0 collapses if even a single frame regresses to the band-leak shape.
  • Title vs scope: "defend macOS regular Chrome screenshots" matches the diff scope precisely. No Linux path changes, no headless-shell behavior changes, no Windows code touched, no scope creep.
  • Band-aid bar: no duplicate guards, no contradictory rules across paths, no decorative gate (the populate path is the single createCaptureSession entry point that every render goes through), no silent black-frame swallow, no comment/code drift.

Issue #1699 reach: Yes — directly. The reporter ran npx hyperframes render . on macOS 26.3 with system Chrome 149.0.7827.155 (no headless-shell). That path is cliproducer.executeRenderJobengine.createCaptureSessionpageScreenshotCapture. The new default at frameCapture.ts:421-425 resolves Chrome/149... + darwincaptureBeyondViewport=true, which is exactly what the reporter's "Questions" section #2 proposed as a fix. Their setDefaultBackgroundColorOverride alternative is not taken; captureBeyondViewport: true is the more general defense and matches what the alpha capture paths already use.

CI state at HEAD 20f9b53b: Healthy. CodeQL (actions, python) success, Detect-changes / Format / Lint / Typecheck / Test / SDK / Studio-load-smoke / Fallow-audit / preview-regression / player-perf / Semantic-PR-title all green on the latest workflow run. Eight regression shards in progress, plus regression-shards / Render on windows-latest / Tests on windows-latest / CLI smoke / Smoke: global install running — none failed. The earlier workflow run shows CANCELLED on several jobs as expected (superseded by the later push). Re-check after CI settles, but no failures visible at write-time.

Prior reviewer state: None — reviews: [], comments: [] at HEAD. No Rames / Xiaye / Genesis / James posts to integrate.

Review by Via

@james-russo-rames-d-jusso james-russo-rames-d-jusso left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed at HEAD 20f9b53b. Lane: canonical rubric + cross-cutting (perf, observability, test-fixture quality, recent-history check). Vance posted first with thorough cohort + path verification — this layers on what they didn't cover. Codex-authored from codex/fix-system-chrome-screenshot-band, applying the Claude/Codex review lens.

Verdict: 🟡 LGTM with two concerns worth addressing before merge.

Recent history context — load-bearing:

screenshotService.ts was touched ~26 hours ago by #1670 (Miguel, merged 2026-06-23 19:13Z), which deliberately reverted #1094's "CBV=true everywhere" default to restore the fast viewport-bound path for perf. This PR is a coherent follow-up: the perf restoration left macOS-Chrome-without-headless-shell on the now-default-false path, exposing the compositor short-surface bug that #1094's blanket fix was incidentally hiding. The chain — #1094#1670#1703 — reads as intentional and the cohort gate (darwin + Chrome/ prefix) is the narrowest plausible defense. Not a silent revert.

Concerns:

  1. packages/producer/tests/chrome-screenshot-bottom-edge/ — the fixture does NOT actually exercise the new code path under CI. Dockerfile.test:61 installs chrome-headless-shell@148.0.7778.167, the runner is Linux, so resolveRequestedCaptureMode (browserManager.ts:266-276) returns "beginframe" — which uses HeadlessExperimental.beginFrame, not Page.captureScreenshot, and never consults captureBeyondViewport or shouldDefaultCaptureBeyondViewport. The fixture is a happy-path render correctness test, not a regression guard for #1699: if you reverted shouldDefaultCaptureBeyondViewport to () => false and re-ran the Docker harness, the fixture would still pass. Either (a) add a unit test that asserts the resolved session.options.captureBeyondViewport === true when browserVersion="Chrome/149..." + platform="darwin" is plumbed through createCaptureSession (covers the integration the bug actually lives at), or (b) re-label the fixture as a screenshot-mode general-quality guard rather than #1699 regression. The three unit tests at screenshotService.test.ts:125-138 correctly cover the heuristic itself — that part is solid.

  2. Observability capture_strategy checkpoint emits a false valuerenderOrchestrator.ts:1445:

    observability.checkpoint("capture_strategy", "resolved", {
      ...
      captureBeyondViewport: captureOptions.captureBeyondViewport ?? false,
    

    Plus renderOrchestrator.ts:1218-1220 and the schema field at render/observability.ts:36. After this PR, on macOS regular Chrome + DOM-only renders, the producer-side captureOptions.captureBeyondViewport is undefined but the engine resolves it to true. The checkpoint will report false — the producer says "we're on the fast path" while the engine actually used the slow path. This is exactly the observability signal #1670 added to track the cohort split; #1703 invalidates its truth on the new branch. Vance flagged the renderOrchestrator:1218 site; the capture_strategy checkpoint at line 1445 is the more user-visible one (it's the structured render-strategy log). Cheapest fix: have createCaptureSession return the resolved value (or expose shouldDefaultCaptureBeyondViewport(browserVersion) to the orchestrator so it can echo what the engine will pick). Producer renders run on Linux servers in prod so the gap is dev-only — but it's a freshly-added observability field whose meaning would silently drift on a single cohort.

Nits / questions:

  • shouldDefaultCaptureBeyondViewport heuristic and puppeteer headless-modebrowser.version() returns Chrome/x.y.z when puppeteer launches a regular Chrome binary with headless: true (which on modern Chrome is --headless=new), so the prefix check correctly catches the reporter's cohort. But worth a verifying log of what browser.version() actually returned on the reporter's machine (Chrome 149.0.7827.155, macOS 26.3) before merge — the bug report says Source: system but doesn't quote the version string. browserManager.ts:384-386 already logs it, so a single repro on macOS regular Chrome confirming the prefix matches Chrome/ would close the loop. Low risk: even if some macOS-darwin regular Chrome variant reports HeadlessChrome/, that's the current bug → no regression vs main, just no fix.

  • Performance regression unflagged — macOS regular Chrome DOM-only renders previously hit the fast viewport-bound path (captureBeyondViewport: false explicitly set by producer); after this PR they hit the slower beyond-viewport compositor path. Per #1670's reproduction, that's ~13.2s → ~17.3s on a 90-frame fixture (~30% slowdown), and #1670's whole point was avoiding that cost. macOS Chrome is dev-only for HeyGen-internal use (CI/prod is Linux), so it's acceptable — but the PR body doesn't mention the perf tradeoff, and the previous PR was written specifically to avoid it. Consider documenting "macOS regular Chrome perf reverts to pre-#1670 levels by design" in the body so the cohort split is intentional-and-recorded.

  • Issue #1699 Q1 ("would hyperframes doctor warn?") — not addressed. Confirmed packages/cli/src/commands/doctor.ts and packages/cli/src/commands/browser.ts are not in the diff. The fix makes the warning less critical, but advising users on the dev path that they're falling back to slower capture is still useful UX. Follow-up, not blocking.

What I verified:

  • Diff scope: gh pr diff matches PR body — 5 production-code files (~28 LOC) + 1 unit test file (+17) + 1 fixture set (~700 LOC). No scope creep.
  • shouldDefaultCaptureBeyondViewport flow: screenshotService.ts:22-31 (heuristic) → frameCapture.ts:421-425 (session-level resolution via sessionOptions) → frameCapture.ts:455 (options: sessionOptions stored on session) → frameCapture.ts:1654 (const { page, options } = session) → frameCapture.ts:1704 (pageScreenshotCapture(page, options)) — value flows through. Also the dedup-verify path at frameCapture.ts:1523 uses session.options directly. Both call sites get the resolved default.
  • Other CDP screenshot paths are immune to the bug regardless of this PR: captureScreenshotWithAlpha (screenshotService.ts:186) and captureAlphaPng (screenshotService.ts:252) both hardcode captureBeyondViewport: true from #1094's HDR guard. Sibling-asymmetry check: ✅ no other path needs the fix.
  • Recent file history on frameCapture.ts and screenshotService.ts via gh api commits?path=: most recent touches are #1670 (screenshotService, 2026-06-23) and #1688 (frameCapture, 2026-06-24, retry-probe-on-transient-browser-errors). No silent revert. No conflicting in-flight intent.
  • Cross-repo consumers of CaptureOptions.captureBeyondViewport: the field type docstring widened to "leave undefined for the safe browser-specific default" — no public-API break since the field stays optional and the behavior change is opt-in. No external SDK / CLI / wrapper updates required.
  • CI: green on 2026-06-24T22:01:57Z run for typecheck, lint, test, SDK, studio-smoke, codeQL python/actions, preview-regression, player-perf. Regression shards 1-8 + Windows render + global install smoke still in flight at write time — no failures visible.

What I didn't verify:

  • Whether the reporter's local Chrome (149.0.7827.155 macOS arm64) actually reports Chrome/ (not HeadlessChrome/) from browser.version() — no macOS host available. The CHANGELOG suggests it does, but explicit confirmation would be cheap.
  • Beyond-viewport interactions with position: fixed / sticky / viewport-% units on the macOS path — beyond-viewport documentedly re-lays-out for the capture; if any composition uses vh units that depend on the real viewport, there's a theoretical layout shift. Existing alpha capture paths already use beyond-viewport in prod without reports, so risk is low.

Stamp posture: Defer to Miguel + Vance on the perf-regression-vs-correctness tradeoff (the call has Miguel's #1670 context; he's the right judge of whether the macOS-Chrome perf cost is acceptable). The two concerns above are addressable in the same PR or as immediate follow-ups; neither blocks the correctness fix.

— Rames D Jusso

@miguel-heygen

Copy link
Copy Markdown
Collaborator Author

Addressed the review-body concerns in ebc08dc1.

the fixture does NOT actually exercise the new code path under CI

Fixed: added frameCapture.captureOptions.test.ts around the session-option resolver now used directly by createCaptureSession, so Chrome/... + darwin resolving to captureBeyondViewport: true is covered outside the Linux/headless Docker fixture. The MP4 fixture stays as a bottom-edge golden, not the only cohort-specific guard.

Observability capture_strategy checkpoint emits a false value

Fixed: producer capture stages now return the engine-resolved captureBeyondViewport value from the consumed session, and the orchestrator feeds that into render observability. The early capture_strategy checkpoint uses the resolved probe/caller value when available, otherwise null instead of incorrectly reporting false.

Performance regression unflagged

Documented in the PR body: macOS regular Chrome intentionally trades the faster viewport-bound path for correctness; Linux/headless-shell keeps the fast path.

@miguel-heygen miguel-heygen merged commit 89ff299 into main Jun 24, 2026
50 of 67 checks passed
@miguel-heygen miguel-heygen deleted the codex/fix-system-chrome-screenshot-band branch June 24, 2026 23:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Render fills the bottom ~85px of every frame with the page background color when falling back to system Chrome (preview is fine)

3 participants