fix(engine): defend macOS regular Chrome screenshots#1703
Conversation
636803b to
cfabb8b
Compare
cfabb8b to
20f9b53
Compare
vanceingalls
left a comment
There was a problem hiding this comment.
PR #1703 ports issue #1699's "bottom-band leak on system Chrome" RCA into a small, surgical engine-side default: when the resolved browser is regular macOS Chrome (Chrome/..., not HeadlessChrome/...), createCaptureSession flips the captureBeyondViewport default from false to true, so Page.captureScreenshot no longer backfills a short compositor surface with the page background. The production-code delta is ~28 lines across 5 files; the other ~715 lines are the new chrome-screenshot-bottom-edge producer regression fixture (compiled.html + MP4 golden + meta.json + index.html). Detection is platform + UA-prefix based, opt-in for HeadlessChrome (keeps the fast viewport-bound path) and respects any explicit caller override.
Verdict: LGTM
Findings (numbered, with severity tag):
-
nit —
packages/engine/src/services/screenshotService.ts:30—browserVersion.startsWith("Chrome/")is the right cohort (Puppeteer'sbrowser.version()returns"Chrome/149.0.7827.155"for regular Chrome and"HeadlessChrome/<x>"for headless-shell, so the prefix check is unambiguous). One genuinely-future case worth a comment-line: if Chrome/Chromium ever changes the user-agent of the macOS regular build (or someone routeschromiumhere), the prefix could go stale — but that's also a clear monitor failure (the new regression fixture would catch it on macOS hosts) and not actionable today. Mention only because the issue reporter did ask whetherdoctorcould warn — that's a future enhancement, not part of this PR's scope. -
nit —
packages/producer/src/services/renderOrchestrator.ts:1216andpackages/producer/src/services/distributed/renderChunk.ts:509— the conditional-spread idiom (...(composition.videos.length > 0 ? { captureBeyondViewport: true } : {})) is the load-bearing change here: it lets the engine-side default apply when no videos are present (previouslyfalsewas passed unconditionally). Correct, and matchesexactOptionalPropertyTypessemantics. TheupdateCaptureObservability({ captureBeyondViewport: captureOptions.captureBeyondViewport ?? false })line right below (renderOrchestrator.ts:1218) will now reportfalsein observability for any producer render where the engine ends up flipping the default totrue(i.e. local macOS producer renders). Producer almost exclusively runs on Linux servers, so the observability gap is minor — but if you want the field to mean "what the engine actually does," consider piping the resolved value back fromcreateCaptureSessionor just deleting the field from the producer observability schema. Author discretion.
Verified:
- Cohort detection:
shouldDefaultCaptureBeyondViewport("Chrome/149...", "darwin") === true,("HeadlessChrome/148...", "darwin") === false,("Chrome/149...", "linux") === false— all three lock in by the unit tests atscreenshotService.test.ts:124-138. The headless-shell on Linux producer path keeps the fast viewport-bound capture; the headless-shell on macOS dev path also keeps it. - vs #1618 (page-side compositor handshake): the handshake at
frameCapture.ts:1291is gated bysession.captureMode !== "beginframe". On macOS,captureModeresolves to"screenshot"atframeCapture.ts:357(theisLinux && headlessShellgate fails), so the handshake fires on the macOS path before and after this PR. The new beyond-viewport default does not bypass it. - vs #1656 (
captureForceScreenshotflip): that gate flipspreModeto"screenshot"on producer renders. Same conclusion — macOS already lands on"screenshot"mode, so the interaction is a no-op for the reporter's cohort. The flag continues to do its work on Linux. - vs #1670 (fast screenshot path for viewport captures): the headless-shell path (Linux + headless-shell) keeps
captureBeyondViewport=falsevia theHeadlessChromeprefix check, so the fast viewport-bound path is preserved for the perf-sensitive cohort. macOS regular Chrome is dev-only; the perf regression there is acceptable. - vs #1702 (in-flow clip anchoring): orthogonal layer (runtime layout vs engine capture). No interaction; verified by file-path disjointness.
- Sticky session.options propagation:
pageScreenshotCaptureis called withsession.options(line 1523, 1704), which is now thesessionOptionswith the resolved default. The per-frame call site atpageScreenshotCapturereadsoptions.captureBeyondViewport ?? false(screenshotService.ts:152), so the macOS render flows thetruedefault all the way to the CDP call. - No silent black-frame / catch-its-own-throw: defense is a single CDP flag change, no try/catch wrapper, no retry loop, no fallback that could mask a real failure. If
Page.captureScreenshot({captureBeyondViewport:true})throws, the existing screenshot-path error handling atframeCapture.ts:1730+raises loudly. - Cohort scope: not "all macOS", not "all Chrome", not "all non-headless-shell" — exactly
darwinandChrome/prefix. False-positive risk: another browser that reportsChrome/...on darwin (e.g. Edge, Brave, Opera) — those also exhibit the same Blink screenshot-compositor behavior, so opting them into the safer path is a feature, not a bug. False-negative risk: a future Chrome with a different version-string shape — unlikely; Chromium'sbrowser.version()format is stable. - Regression fixture:
chrome-screenshot-bottom-edgepaints a red page background under a green opaque foreground; the goldens are reproducible via the--updateDocker incantation already documented in the PR description test plan.minPsnr: 30, maxFrameFailures: 0collapses if even a single frame regresses to the band-leak shape. - Title vs scope: "defend macOS regular Chrome screenshots" matches the diff scope precisely. No Linux path changes, no headless-shell behavior changes, no Windows code touched, no scope creep.
- Band-aid bar: no duplicate guards, no contradictory rules across paths, no decorative gate (the populate path is the single
createCaptureSessionentry point that every render goes through), no silent black-frame swallow, no comment/code drift.
Issue #1699 reach: Yes — directly. The reporter ran npx hyperframes render . on macOS 26.3 with system Chrome 149.0.7827.155 (no headless-shell). That path is cli → producer.executeRenderJob → engine.createCaptureSession → pageScreenshotCapture. The new default at frameCapture.ts:421-425 resolves Chrome/149... + darwin → captureBeyondViewport=true, which is exactly what the reporter's "Questions" section #2 proposed as a fix. Their setDefaultBackgroundColorOverride alternative is not taken; captureBeyondViewport: true is the more general defense and matches what the alpha capture paths already use.
CI state at HEAD 20f9b53b: Healthy. CodeQL (actions, python) success, Detect-changes / Format / Lint / Typecheck / Test / SDK / Studio-load-smoke / Fallow-audit / preview-regression / player-perf / Semantic-PR-title all green on the latest workflow run. Eight regression shards in progress, plus regression-shards / Render on windows-latest / Tests on windows-latest / CLI smoke / Smoke: global install running — none failed. The earlier workflow run shows CANCELLED on several jobs as expected (superseded by the later push). Re-check after CI settles, but no failures visible at write-time.
Prior reviewer state: None — reviews: [], comments: [] at HEAD. No Rames / Xiaye / Genesis / James posts to integrate.
Review by Via
There was a problem hiding this comment.
Reviewed at HEAD 20f9b53b. Lane: canonical rubric + cross-cutting (perf, observability, test-fixture quality, recent-history check). Vance posted first with thorough cohort + path verification — this layers on what they didn't cover. Codex-authored from codex/fix-system-chrome-screenshot-band, applying the Claude/Codex review lens.
Verdict: 🟡 LGTM with two concerns worth addressing before merge.
Recent history context — load-bearing:
screenshotService.ts was touched ~26 hours ago by #1670 (Miguel, merged 2026-06-23 19:13Z), which deliberately reverted #1094's "CBV=true everywhere" default to restore the fast viewport-bound path for perf. This PR is a coherent follow-up: the perf restoration left macOS-Chrome-without-headless-shell on the now-default-false path, exposing the compositor short-surface bug that #1094's blanket fix was incidentally hiding. The chain — #1094 → #1670 → #1703 — reads as intentional and the cohort gate (darwin + Chrome/ prefix) is the narrowest plausible defense. Not a silent revert.
Concerns:
-
packages/producer/tests/chrome-screenshot-bottom-edge/— the fixture does NOT actually exercise the new code path under CI.Dockerfile.test:61installschrome-headless-shell@148.0.7778.167, the runner is Linux, soresolveRequestedCaptureMode(browserManager.ts:266-276) returns"beginframe"— which usesHeadlessExperimental.beginFrame, notPage.captureScreenshot, and never consultscaptureBeyondViewportorshouldDefaultCaptureBeyondViewport. The fixture is a happy-path render correctness test, not a regression guard for #1699: if you revertedshouldDefaultCaptureBeyondViewportto() => falseand re-ran the Docker harness, the fixture would still pass. Either (a) add a unit test that asserts the resolvedsession.options.captureBeyondViewport === truewhenbrowserVersion="Chrome/149..." + platform="darwin"is plumbed throughcreateCaptureSession(covers the integration the bug actually lives at), or (b) re-label the fixture as a screenshot-mode general-quality guard rather than #1699 regression. The three unit tests atscreenshotService.test.ts:125-138correctly cover the heuristic itself — that part is solid. -
Observability
capture_strategycheckpoint emits a false value —renderOrchestrator.ts:1445:observability.checkpoint("capture_strategy", "resolved", { ... captureBeyondViewport: captureOptions.captureBeyondViewport ?? false,Plus
renderOrchestrator.ts:1218-1220and the schema field atrender/observability.ts:36. After this PR, on macOS regular Chrome + DOM-only renders, the producer-sidecaptureOptions.captureBeyondViewportisundefinedbut the engine resolves it totrue. The checkpoint will reportfalse— the producer says "we're on the fast path" while the engine actually used the slow path. This is exactly the observability signal #1670 added to track the cohort split; #1703 invalidates its truth on the new branch. Vance flagged the renderOrchestrator:1218 site; thecapture_strategycheckpoint at line 1445 is the more user-visible one (it's the structured render-strategy log). Cheapest fix: havecreateCaptureSessionreturn the resolved value (or exposeshouldDefaultCaptureBeyondViewport(browserVersion)to the orchestrator so it can echo what the engine will pick). Producer renders run on Linux servers in prod so the gap is dev-only — but it's a freshly-added observability field whose meaning would silently drift on a single cohort.
Nits / questions:
-
shouldDefaultCaptureBeyondViewportheuristic and puppeteer headless-mode —browser.version()returnsChrome/x.y.zwhen puppeteer launches a regular Chrome binary withheadless: true(which on modern Chrome is--headless=new), so the prefix check correctly catches the reporter's cohort. But worth a verifying log of whatbrowser.version()actually returned on the reporter's machine (Chrome 149.0.7827.155, macOS 26.3) before merge — the bug report saysSource: systembut doesn't quote the version string.browserManager.ts:384-386already logs it, so a single repro on macOS regular Chrome confirming the prefix matchesChrome/would close the loop. Low risk: even if some macOS-darwin regular Chrome variant reportsHeadlessChrome/, that's the current bug → no regression vs main, just no fix. -
Performance regression unflagged — macOS regular Chrome DOM-only renders previously hit the fast viewport-bound path (
captureBeyondViewport: falseexplicitly set by producer); after this PR they hit the slower beyond-viewport compositor path. Per #1670's reproduction, that's ~13.2s → ~17.3s on a 90-frame fixture (~30% slowdown), and #1670's whole point was avoiding that cost. macOS Chrome is dev-only for HeyGen-internal use (CI/prod is Linux), so it's acceptable — but the PR body doesn't mention the perf tradeoff, and the previous PR was written specifically to avoid it. Consider documenting "macOS regular Chrome perf reverts to pre-#1670 levels by design" in the body so the cohort split is intentional-and-recorded. -
Issue #1699 Q1 ("would
hyperframes doctorwarn?") — not addressed. Confirmedpackages/cli/src/commands/doctor.tsandpackages/cli/src/commands/browser.tsare not in the diff. The fix makes the warning less critical, but advising users on the dev path that they're falling back to slower capture is still useful UX. Follow-up, not blocking.
What I verified:
- Diff scope:
gh pr diffmatches PR body — 5 production-code files (~28 LOC) + 1 unit test file (+17) + 1 fixture set (~700 LOC). No scope creep. shouldDefaultCaptureBeyondViewportflow:screenshotService.ts:22-31(heuristic) →frameCapture.ts:421-425(session-level resolution viasessionOptions) →frameCapture.ts:455(options: sessionOptionsstored on session) →frameCapture.ts:1654(const { page, options } = session) →frameCapture.ts:1704(pageScreenshotCapture(page, options)) — value flows through. Also the dedup-verify path atframeCapture.ts:1523usessession.optionsdirectly. Both call sites get the resolved default.- Other CDP screenshot paths are immune to the bug regardless of this PR:
captureScreenshotWithAlpha(screenshotService.ts:186) andcaptureAlphaPng(screenshotService.ts:252) both hardcodecaptureBeyondViewport: truefrom #1094's HDR guard. Sibling-asymmetry check: ✅ no other path needs the fix. - Recent file history on
frameCapture.tsandscreenshotService.tsviagh api commits?path=: most recent touches are #1670 (screenshotService, 2026-06-23) and #1688 (frameCapture, 2026-06-24, retry-probe-on-transient-browser-errors). No silent revert. No conflicting in-flight intent. - Cross-repo consumers of
CaptureOptions.captureBeyondViewport: the field type docstring widened to "leave undefined for the safe browser-specific default" — no public-API break since the field stays optional and the behavior change is opt-in. No external SDK / CLI / wrapper updates required. - CI: green on 2026-06-24T22:01:57Z run for typecheck, lint, test, SDK, studio-smoke, codeQL python/actions, preview-regression, player-perf. Regression shards 1-8 + Windows render + global install smoke still in flight at write time — no failures visible.
What I didn't verify:
- Whether the reporter's local Chrome (149.0.7827.155 macOS arm64) actually reports
Chrome/(notHeadlessChrome/) frombrowser.version()— no macOS host available. The CHANGELOG suggests it does, but explicit confirmation would be cheap. - Beyond-viewport interactions with
position: fixed/ sticky / viewport-% units on the macOS path — beyond-viewport documentedly re-lays-out for the capture; if any composition usesvhunits that depend on the real viewport, there's a theoretical layout shift. Existing alpha capture paths already use beyond-viewport in prod without reports, so risk is low.
Stamp posture: Defer to Miguel + Vance on the perf-regression-vs-correctness tradeoff (the call has Miguel's #1670 context; he's the right judge of whether the macOS-Chrome perf cost is acceptable). The two concerns above are addressable in the same PR or as immediate follow-ups; neither blocks the correctness fix.
— Rames D Jusso
|
Addressed the review-body concerns in
Fixed: added
Fixed: producer capture stages now return the engine-resolved
Documented in the PR body: macOS regular Chrome intentionally trades the faster viewport-bound path for correctness; Linux/headless-shell keeps the fast path. |
What
Fixes macOS regular-Chrome renders where the bottom of the frame can show the page background instead of composition content.
Why
Affected macOS/Chrome builds can return a viewport-bound screenshot surface shorter than the requested clip. The fallback page background then leaks into the bottom of the captured frame.
How
Regular Chrome on macOS now defaults to the beyond-viewport screenshot path. Headless Chrome and explicit caller overrides keep the viewport-bound path. Producer observability now reports the engine-resolved value, and a producer MP4 fixture covers bottom-edge output against a red background leak.
This intentionally trades macOS regular-Chrome DOM-only render speed for correctness; Linux/headless-shell paths keep the faster viewport-bound capture.
Fixes #1699
Test plan
Commands run:
bun run --cwd packages/engine test src/services/screenshotService.test.ts src/services/frameCapture.captureOptions.test.tsbun run --cwd packages/engine typecheckbun run --cwd packages/producer typecheckbun run --cwd packages/cli dev lint ../../packages/producer/tests/chrome-screenshot-bottom-edge/srcbun run --cwd packages/cli dev validate ../../packages/producer/tests/chrome-screenshot-bottom-edge/src --timeout 10000bunx oxlint packages/engine/src/services/frameCapture.ts packages/engine/src/services/frameCapture.captureOptions.test.ts packages/engine/src/services/screenshotService.ts packages/engine/src/services/screenshotService.test.ts packages/engine/src/types.ts packages/producer/src/services/renderOrchestrator.ts packages/producer/src/services/render/stages/captureStage.ts packages/producer/src/services/render/stages/captureStreamingStage.ts packages/producer/src/services/distributed/renderChunk.tsbunx oxfmt --check packages/engine/src/services/frameCapture.ts packages/engine/src/services/frameCapture.captureOptions.test.ts packages/engine/src/services/screenshotService.ts packages/engine/src/services/screenshotService.test.ts packages/engine/src/types.ts packages/producer/src/services/renderOrchestrator.ts packages/producer/src/services/render/stages/captureStage.ts packages/producer/src/services/render/stages/captureStreamingStage.ts packages/producer/src/services/distributed/renderChunk.tsdocker build -f Dockerfile.test -t hyperframes-producer:test .docker run --rm --security-opt seccomp=unconfined --shm-size=2g -v ./packages/producer/tests:/app/packages/producer/tests hyperframes-producer:test --update chrome-screenshot-bottom-edgedocker run --rm --security-opt seccomp=unconfined --shm-size=2g -v ./packages/producer/tests:/app/packages/producer/tests hyperframes-producer:test chrome-screenshot-bottom-edge