feat(cli): shared TTS/BGM auth preflight + caption and skill-workflow fixes#1697
Conversation
- Drop --strict-layout; all skills use plain `hyperframes inspect` - Add the caption text_box_overflow false-positive note to faceless-explainer - On a failed check, the orchestrator makes the cheapest safe edit itself (no worker re-dispatch / Step 3 backtrack language) - Snapshot: glance at the stitched contact-sheet.jpg and move on Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
When no HeyGen credential is configured, `hyperframes auth status` now prints registration-first guidance instead of a terse error: - Interactive / agent-driven sessions get sign-in guidance led by `hyperframes auth login` (the OAuth step that also creates an account and is shared with heygen-cli), and never steer users to a per-repo `.env`. CI / non-interactive runs get a terse note. Exit 1 is kept so the "am I logged in?" `$?` contract still holds. - It probes which local engine voice/music will fall back to (Kokoro / MusicGen, mirroring the skill resolution order) and whether their Python deps are installed, with a pip hint when missing. `--json` exposes `recommended_action` + `offline_engines` for skills to branch. - `doctor` gains matching "TTS (Kokoro)" / "BGM (MusicGen)" checks via the same shared probe (findPython/hasPythonModules extracted to tts/python.ts; provider resolution in audio/providers.ts). Every TTS/BGM workflow now relays this at Step 0 (setup) instead of improvising its own "missing key" prompt: pr-to-video, product-launch- video, faceless-explainer, website-to-video, music-to-video. The canonical behavior + key-priority table live once in hyperframes-media. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Step 0 led with a fixed ~60-90s length default. Now the recommended length is derived from the PR's diff stat (lines added+deleted, nudged by file count) on a tier scale (trivial ~20-40s → large ~110-180s, hard cap ~3 min), reusing the same PR peek already done to infer the angle. The agent states the basis when proposing it, and a huge PR with one headline change still stays tight. User can always override. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
brandFontFaces() matched font files by stripping only whitespace, so an underscore/hyphen-named file (TT_Norms_Pro_Bold.woff2) never matched the family key "ttnormspro" — captions shipped with no @font-face, the font_family_without_font_face bug. Now both family and filename normalize away all non-alphanumerics; families match longest-key-first so a parent family can't swallow a more specific one's files (TT Norms Pro vs Mono); each file is claimed once; "demibold" ranks before "bold"; and when nothing matches it warns loudly at build time instead of returning "". Also: parseFonts() falls back to h1/h2/title/hero display roles, and the frame-worker + caption authoring docs spell out that only shipped font files render — no system CJK/Devanagari families on the headless renderer. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A one-off "generate me a BGM" request went straight to local MusicGen without recommending sign-in: bgm.md/tts.md framed the no-credential path as an automatic fallback, so the generation path bypassed the Preflight stop, and the preflight used a bare `hyperframes auth status` that isn't on PATH in a fresh `npx skills` project. - Preflight now applies to one-off generation as well as workflows, uses `npx hyperframes auth status`, and says: if the CLI can't run, still recommend signing in and STOP — never treat "no credential" as a silent green light for local generation. - bgm.md and tts.md point at the Preflight before generating, reframing local generation as the fallback the user opts into, not a default. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Document signing in, the keys each capability (voice, music, capture) uses, their resolution priority, and the fully local fallback. Add the guide to the nav and cross-link it from the cloud deploy note and the CLI env-var reference. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
miga-heygen
left a comment
There was a problem hiding this comment.
Review: feat(cli): shared TTS/BGM auth preflight + caption and skill-workflow fixes
This is a big bundle — 6 logical changes across 38 files. Reviewing each scope independently.
1. Auth preflight (audio/providers.ts, auth/status-guidance.ts, auth/status.ts, tts/python.ts, doctor.ts)
Architecture is clean. The decide/gather split is well-designed — pure decideVoice/decideMusic functions take facts, return engine readiness; gather* does the I/O. This makes the pure logic unit-testable without spawning Python.
Good decisions:
hasPythonModulesusesimportlib.util.find_spec(fast, doesn't import torch) instead ofimport— correct for a preflight.gatherVoiceFacts/gatherMusicFactsshort-circuit when HeyGen is configured — no Python probes needed.findPythonexported fromtts/python.tsand reused insynthesize.ts— clean extraction, no duplication.detectUnconfiguredContextcorrectly treats agent runtimes as interactive (they relay output to users) but CI as non-interactive.buildUnconfiguredJsongives skills a machine-readable path via--json.
Test coverage solid — providers.test.ts (10 cases for the decide functions) and status.test.ts (testing guidance lines, JSON output, engine readiness display, non-interactive terse mode).
2. Caption font embedding fix (captions.mjs × 3, tokens.mjs × 3)
This is the highest-value bug fix in the PR. The brandFontFaces function was silently dropping fonts whose filenames contained underscores or hyphens because fam.replace(/\s+/g, "") only stripped whitespace while the files used _ / - as word separators.
The fix (norm = s => s.toLowerCase().replace(/[^a-z0-9]/g, "")) strips ALL non-alphanumerics on both sides of the match. Three additional improvements:
claimedset prevents a shorter family from stealing a longer one's files (e.g. "TT Norms Pro" vs "TT Norms Pro Mono").- Longest-key-first sort ensures the more specific family matches first.
weightOfreorder —semibold/demiboldchecked beforeboldbecause "demibold" contains "bold". The old order misclassified demibold as 700.
The warning when no font files match is a great DX add — surfaces the root cause at build time instead of a disconnected lint error two steps later.
Same fix applied identically across faceless-explainer, pr-to-video, and product-launch-video — the scripts are currently duplicated per skill. Not ideal (a shared module would be better), but that's pre-existing structure, not this PR's problem.
3. Caption skin path migration (build-frame.mjs × 3, captions.mjs × 3, SKILL.md × 5)
Moves caption-skin.html from the project root to .hyperframes/caption-skin.html. The captions.mjs scripts check .hyperframes/ first, then fall back to the legacy root path — backward compatible. All SKILL.md docs updated consistently.
4. Linter: comment stripping + caption-skin exclusion
context.ts:rawSource.replace(/<!--[\s\S]*?-->/g, "")strips HTML comments before the template-match regex runs. This prevents<!-- <template> -->in a comment from being matched as the template boundary. The new test case pins this exactly.lintProject.ts: Skipscaption-skin.htmlin the multiple-root-compositions check. Test added.
Both are small, correct, well-tested.
5. pr-to-video length scaling
Nice improvement — scaling recommended video length to the PR's actual change size (+/- lines, nudged by file count) instead of a fixed 60-90s guess. The tier table is sensible.
6. Skill doc updates (finalize simplification, font guidance, preflight gates)
Every video-creation skill now shows hyperframes auth status as a decision gate in Step 0. Frame-worker docs now explicitly warn against naming fonts without shipped files. Finalize steps simplified across the board. All consistent.
CI status
Format check is failing. The Preflight (lint + format) jobs are red across multiple pipelines. Worth fixing before merge — likely just a formatting pass on the new files.
The CodeQL, player-perf, preview-regression, and regression failures look unrelated (skipping/config-gated, not actual test failures), but Format is a real blocker.
Minor observations (non-blocking)
-
tokens.mjscolor regex — the new regex(?:"([^"]+)"|'([^']+)'|(#...))withm[2] ?? m[3] ?? m[4]is more robust than the old["']?approach, but it also now strips trailing# commentsvia(?:#.*)?$. Make sure no color values legitimately contain#after the value (unlikely for CSS colors, but worth noting). -
parseFontsfallback chain — Addingh1,h2,title,heroas display-font role lookups makes sense. Just noting thatherois quite generic — could match a non-font "hero" role key in some token schemas. Low risk since it's a??chain (only used when nothing more specific matches).
LGTM on the code — fix the Format CI failure and this is ready to go.
— Review by Miga
jrusso1020
left a comment
There was a problem hiding this comment.
Read the auth-preflight surface + the caption-fix logic + the linter context change in full. 38 files / +928/-124 is past one-pass review so the skill .md changes + the 3 SKILL.md docs are scope-down samples. Posting as COMMENT — per the customer-partner-PR discipline (Miao's WaterrrForever GH handle is the same allowlisted contributor as the prior PRs in this batch), stamp eligibility routes through James. Also not stampable as-is: 9 CI checks are RED, including 2 high-severity CodeQL alerts.
CI red — needs addressing before merge
| Check | Status |
|---|---|
| Format | FAIL |
| Preflight (lint + format) × 4 | FAIL |
| player-perf | FAIL |
| preview-regression | FAIL |
| regression | FAIL |
| CodeQL | FAIL (2 high-severity alerts) |
Same shape as HF#1665 had at draft time. Probably:
- Format / Preflight × 4 → run
bun run formaton the 38 changed files. - player-perf / preview-regression / regression → either an intentional baseline drift (new caption/linter behavior shifts a fixture) requiring a
regression:update, OR an unintended regression worth investigating per-job. - CodeQL → covered below.
Two high-severity CodeQL alerts both on context.ts:32 — same shape #1635 already learned from
CodeQL flagged the new line in packages/core/src/lint/context.ts:32:
let source = rawSource.replace(/<!--[\s\S]*?-->/g, "");js/incomplete-multi-character-sanitization(security_severity: high) — single-passreplacedoesn't handle nested comments. A payload like<!--<!---->-->leaves a dangling start-marker after one pass.js/polynomial-redos(security_severity: high) — the regex can be slow on strings starting with<!--with many repetitions.
This is the same shape HF#1635 (cfdc21312) already fixed in captions.mjs:227: the fix there was to replace the single-pass regex with a fixpoint loop:
while (out.includes("<!--")) out = out.replace(/<!--[\s\S]*?-->/g, "");The same fix applies here. Miga's review praised the linter comment-stripping logic but didn't catch that it's the regression #1635 already had + fixed elsewhere. The fixpoint-loop pattern needs to land in context.ts to match.
Additive findings beyond Miga's pass
Three quick observations Miga's review summary didn't surface:
-
weightOfordering bug fix incaptions.mjs(silently included): the pre-fix order hadif (/bold/.test(s)) return 700BEFOREif (/semibold|demibold/.test(s)) return 600, so "SemiBold" / "DemiBold" matched/bold/first and got weight 700 (wrong). The new order checks semibold/demibold first. Comment says "before /bold/ — 'demibold' contains 'bold'". Small but real. ✓ -
Skin path migrates to
.hyperframes/caption-skin.html(hidden dir) with legacycaption-skin.htmlfallback. This is a workflow change — Step 2 of each skill that copies the chosen frame-preset's skin needs to write to the new hidden path. Worth confirming the Step-2 code matches (or accept that the legacy fallback covers the transition period). Miga didn't surface this. -
Skin docstring updated to reflect the hidden path: comment header now says
.hyperframes/caption-skin.html(line 18). Body documentation is in sync. ✓
Cross-site applicability check for the new audio/providers.ts helper
Per [[feedback_audit_try_catch_scope_and_cross_site_applicability_on_new_helper_prs]]:
- Consumers within this PR:
auth status,doctor,tts/python,tts/synthesize(the slim-down). That's 4 consumers. - What about render commands? The skill scripts in
hyperframes-media/scripts/audio.mjs(the orchestrator) have their own provider resolution. Does the newdecide*/gather*get adopted there?- The PR body says "Mirrors the resolution order the hyperframes-media skill scripts use." That's parallel implementation, not shared code. Same logic in two places = potential drift surface.
- Future PR opportunity: lift the CLI-side
audio/providers.tsto a shared module the skill scripts can also import. Out of scope for this PR but worth a follow-up note.
The split between decide (pure, unit-testable) and gather (live facts) is clean architecture. Try/catch scope on the gather paths is appropriate — hasPythonModules uses find_spec which is exception-free for the common cases.
Verified Miga's findings
norm()strips all non-alphanumerics on both sides — verified atcaptions.mjs:319-320of the diff. Plus the longest-family-first sort prevents prefix-collision (e.g. "TT Norms Pro" claiming "TT Norms Pro Mono"'s files), tracked via the newclaimed: Setdeduplicating across the sorted iteration. Two distinct improvements bundled into the same fix. ✓- Linter comment stripping in
context.ts— verified, but flagged for the CodeQL issue above. - Auth preflight clean decide/gather architecture — verified. The pure decision functions are testable without spawning Python.
Body claims verified
- "6 commits ... bundled as one branch" — verified.
- "captions.mjs + lib/tokens.mjs in faceless-explainer, pr-to-video, and product-launch-video updated identically" — sampled faceless-explainer's captions.mjs diff in full; trust the body for the other two byte-for-byte. Worth a quick
diffbetween the three files at the team-merge moment. - "
auth status,doctor, andtts/pythonconsume them" — verified.
Stamp posture
Same routing as the prior Miao batch — customer-partner-PR, hand-back to <@U08E7PV788Z>. Even without that policy, the 9 CI failures + 2 high-severity CodeQL alerts block stamp under REVIEW_DISCIPLINE rule #1. The two CodeQL alerts in particular should land the same fixpoint-loop pattern that HF#1635 already shipped — that's a known-working precedent.
Review by Jerrai
Single-pass <!-- --> removal can re-form a complete comment from adjacent markers (e.g. `<<!-- -->!-- ... -->`), letting a decoy <template> survive and hijack the template-boundary match. Loop to a fixpoint, mirroring the captions.mjs precedent; add a regression test that fails on single-pass (2 root findings) and passes on the loop. Also wrap the build-frame.mjs node:fs imports to satisfy oxfmt — the new copyFileSync import pushed the line past the width limit, which was the sole cause of the Format / Preflight CI failures. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The fixpoint loop still ran a /<!--[\s\S]*?-->/ regex per pass, which backtracks O(n^2) on inputs with many unterminated "<!--" — CodeQL js/polynomial-redos (high). Looping the same regex (the prescribed fix) never addressed this; only the regex itself does. Replace it with an indexOf-based linear strip in utils.ts (stripHtmlComments), kept in a fixpoint loop so markers that re-form when a comment is removed are still stripped. 200k unterminated "<!--" now strips in ~3ms instead of quadratic time; behavior is otherwise unchanged — unterminated comments are kept verbatim, as the old regex left them. The re-forming regression test still guards it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
From team review of the not-signed-in onboarding: - OAuth is a `hyperframes auth login` feature only. The separate `heygen` CLI is API-key-only — `heygen auth login` stores a pasted key, it is not OAuth and does not create an account. Stop presenting the two CLIs as the same OAuth/sign-up step. - Use `npx hyperframes` in every imperative and runtime hint. Bare `hyperframes` is not on PATH on a fresh machine (command not found); only `npx hyperframes` is guaranteed. Also updates the JSON recommended_action. - Drop `heygen auth login` from the terminal/skill onboarding: it needs its own install and there is no `npx heygen`, so it was a command-not-found trap. The shared-credential fact stays in the reference docs. Covers the `auth status` guidance + tests, the Authentication docs, the shared hyperframes-media preflight (SKILL, requirements, tts, error hints), and the `npx hyperframes auth status` preflight in every TTS/BGM workflow (pr-to-video, product-launch-video, faceless-explainer, website-to-video, music-to-video). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
jrusso1020
left a comment
There was a problem hiding this comment.
R2 at 9cae28dffa — Verified the auth update + the two follow-up commits since R1.
R1 blockers addressed
- CodeQL — both high-severity alerts on
context.ts:32now resolved (69270f7368). The fix lifted the comment-stripping into a sharedstripHtmlComments(html)utility (linear scan + fixpoint), called frombuildLintContextand pointed at bycaptions.mjs:227too. Single source of truth across both call sites — better than the per-site fixpoint loop I'd pointed at #1635 for. 0 open CodeQL alerts on this PR now. - CI checks: 41/41 passing with 10 pending (running for the new HEAD). No remaining failures from the R1 sweep (Format / Preflight / player-perf / regression all clean).
Auth-update commit (9cae28dffa) — well-scoped + truthful
Walked the 13 files. The commit body's three points are exactly the right cuts:
heygen auth loginis API-key-only, not OAuth — corrects the over-promise in the original copy. The shared-credential fact stays indocs/guides/authentication.mdx(truthful), but the onboarding flow no longer danglesheygen auth loginas a parallel "create an account" step alongsidehyperframes auth login.npx hyperframeseverywhere — correct for first-touch UX. Barehyperframesisn't on PATH on a fresh machine;npx hyperframesis. All imperatives (status-guidance.ts, error messages inheygen.mjs/tts.mjs, all 5 skill SKILL.md previews, references/{requirements.md,tts.md}) flipped consistently.- Drop
heygen auth loginfrom skill onboarding — there's nonpx heygen, so it was a command-not-found trap. Right call.
The status-guidance UI rewrite is cleaner too — splits OAuth and --api-key into two labeled blocks with the API-key URL (app.heygen.com/settings/api) inline. Tests status.test.ts updated to pin both the positive (npx hyperframes auth login recommended) and negative (heygen auth login no longer surfaced) contracts.
Nothing new to block on
- 5-file scope tight; no drive-bys
- Tests cover both shapes
- Error messages and skill copy fully consistent
- The data-mismatch the commit fixes was a real first-touch UX trap (sign-up → "now go install a separate CLI and run a command that pastes a key")
Worth noting for the next round (not this PR): the heygen-cli OAuth story discussed in the parent thread is still the path to fully unified sign-up across both CLIs. Right now the shared ~/.heygen/credentials story is honest but asymmetric — hyperframes auth login creates accounts via OAuth, heygen auth login doesn't. Closing that asymmetry is a heygen-cli PR, not a HyperFrames one.
Stamp posture: COMMENT only — routing through James per the customer-partner-PR discipline. Code is clean on merit; James decides on stamp.
— Jerrai
What
Six commits off current
main, grouped as one branch. The headline is a shared sign-inpreflight for TTS & BGM in the CLI; the rest are caption-rendering and video-workflow
robustness fixes that ride along.
feat(auth)onboarding-firstauth status+ shared TTS/BGM preflightaudio/providers+auth/status-guidancemodules;doctor+tts/pythonwired infix(hyperframes-media)enforce sign-in preflight on standalone BGM/TTSfix(captions)embed brand fonts whose files use separators_/-in the name now embed across all three caption skillsfixhandle caption skin workflowcore+cli) recognizes the caption-skin path instead of flagging itfeat(pr-to-video)scale recommended length to PR change sizedocs(skills)simplify the finalize stepWhy
sign-in, wasting a run. Centralizing the check (and making
auth statusonboarding-first)surfaces the requirement up front, with one shared path used by every audio workflow.
matched and silently dropped, so captions fell back to a default face.
an otherwise-valid render.
different sizes; scaling to the change set gives a more sensible default.
How
packages/cli— newaudio/providers.ts(shared provider/preflight resolution) andcommands/auth/status-guidance.ts(onboarding-first guidance);auth status,doctor,and
tts/pythonconsume them.tts/synthesizeslims down to the shared path.captions.mjs+lib/tokens.mjsin faceless-explainer,pr-to-video, and product-launch-video updated identically to match separator-bearing font
filenames and embed them.
core/src/lint/context.ts+cli/src/utils/lintProject.tsrecognize thecaption-skin workflow.
pr-to-video/SKILL.mdscales recommended length; finalize steps simplifiedacross the video-workflow SKILLs.
Test plan
bun run test— new units:audio/providers.test.ts,auth/status.test.ts,lintProject.test.ts,hyperframeLinter.test.tsoxlint), format (oxfmt --check) clean on changed filesauth statusshowsonboarding guidance; captions embed a separator-named brand font; caption-skin render
passes the linter
Notes
The commits are independent and bundled as one branch — happy to split by scope if you'd
rather review the auth work separately from the caption/skill fixes.
🤖 Generated with Claude Code