Skip to content

feat(cli): shared TTS/BGM auth preflight + caption and skill-workflow fixes#1697

Merged
WaterrrForever merged 10 commits into
mainfrom
feat/media-auth-preflight-and-caption-fixes
Jun 24, 2026
Merged

feat(cli): shared TTS/BGM auth preflight + caption and skill-workflow fixes#1697
WaterrrForever merged 10 commits into
mainfrom
feat/media-auth-preflight-and-caption-fixes

Conversation

@WaterrrForever

@WaterrrForever WaterrrForever commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

What

Six commits off current main, grouped as one branch. The headline is a shared sign-in
preflight for TTS & BGM
in the CLI; the rest are caption-rendering and video-workflow
robustness fixes that ride along.

Commit Scope
feat(auth) onboarding-first auth status + shared TTS/BGM preflight new audio/providers + auth/status-guidance modules; doctor + tts/python wired in
fix(hyperframes-media) enforce sign-in preflight on standalone BGM/TTS standalone BGM/TTS now gate on sign-in before running
fix(captions) embed brand fonts whose files use separators font files with spaces/_/- in the name now embed across all three caption skills
fix handle caption skin workflow linter (core + cli) recognizes the caption-skin path instead of flagging it
feat(pr-to-video) scale recommended length to PR change size recommended video length now tracks the size of the PR's diff
docs(skills) simplify the finalize step trimmed the finalize step across the video workflows

Why

  • Auth preflight — standalone BGM/TTS could start work before failing on a missing
    sign-in, wasting a run. Centralizing the check (and making auth status onboarding-first)
    surfaces the requirement up front, with one shared path used by every audio workflow.
  • Caption font embedding — brand fonts whose filenames contain separators were not
    matched and silently dropped, so captions fell back to a default face.
  • Caption-skin workflow — the linter flagged the caption-skin path as invalid, blocking
    an otherwise-valid render.
  • pr-to-video length — a fixed recommended length over/under-shot for PRs of very
    different sizes; scaling to the change set gives a more sensible default.

How

  • packages/cli — new audio/providers.ts (shared provider/preflight resolution) and
    commands/auth/status-guidance.ts (onboarding-first guidance); auth status, doctor,
    and tts/python consume them. tts/synthesize slims down to the shared path.
  • Caption scriptscaptions.mjs + lib/tokens.mjs in faceless-explainer,
    pr-to-video, and product-launch-video updated identically to match separator-bearing font
    filenames and embed them.
  • Lintercore/src/lint/context.ts + cli/src/utils/lintProject.ts recognize the
    caption-skin workflow.
  • Skillspr-to-video/SKILL.md scales recommended length; finalize steps simplified
    across the video-workflow SKILLs.

Test plan

  • bun run test — new units: audio/providers.test.ts, auth/status.test.ts,
    lintProject.test.ts, hyperframeLinter.test.ts
  • Build, lint (oxlint), format (oxfmt --check) clean on changed files
  • Manual: standalone BGM/TTS without sign-in stops at preflight; auth status shows
    onboarding guidance; captions embed a separator-named brand font; caption-skin render
    passes the linter

Notes

The commits are independent and bundled as one branch — happy to split by scope if you'd
rather review the auth work separately from the caption/skill fixes.

🤖 Generated with Claude Code

WaterrrForever and others added 6 commits June 24, 2026 13:58
- Drop --strict-layout; all skills use plain `hyperframes inspect`
- Add the caption text_box_overflow false-positive note to faceless-explainer
- On a failed check, the orchestrator makes the cheapest safe edit itself
  (no worker re-dispatch / Step 3 backtrack language)
- Snapshot: glance at the stitched contact-sheet.jpg and move on

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
When no HeyGen credential is configured, `hyperframes auth status` now
prints registration-first guidance instead of a terse error:

- Interactive / agent-driven sessions get sign-in guidance led by
  `hyperframes auth login` (the OAuth step that also creates an account
  and is shared with heygen-cli), and never steer users to a per-repo
  `.env`. CI / non-interactive runs get a terse note. Exit 1 is kept so
  the "am I logged in?" `$?` contract still holds.
- It probes which local engine voice/music will fall back to (Kokoro /
  MusicGen, mirroring the skill resolution order) and whether their
  Python deps are installed, with a pip hint when missing. `--json`
  exposes `recommended_action` + `offline_engines` for skills to branch.
- `doctor` gains matching "TTS (Kokoro)" / "BGM (MusicGen)" checks via
  the same shared probe (findPython/hasPythonModules extracted to
  tts/python.ts; provider resolution in audio/providers.ts).

Every TTS/BGM workflow now relays this at Step 0 (setup) instead of
improvising its own "missing key" prompt: pr-to-video, product-launch-
video, faceless-explainer, website-to-video, music-to-video. The
canonical behavior + key-priority table live once in hyperframes-media.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Step 0 led with a fixed ~60-90s length default. Now the recommended
length is derived from the PR's diff stat (lines added+deleted, nudged
by file count) on a tier scale (trivial ~20-40s → large ~110-180s, hard
cap ~3 min), reusing the same PR peek already done to infer the angle.
The agent states the basis when proposing it, and a huge PR with one
headline change still stays tight. User can always override.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
brandFontFaces() matched font files by stripping only whitespace, so an
underscore/hyphen-named file (TT_Norms_Pro_Bold.woff2) never matched the
family key "ttnormspro" — captions shipped with no @font-face, the
font_family_without_font_face bug. Now both family and filename normalize
away all non-alphanumerics; families match longest-key-first so a parent
family can't swallow a more specific one's files (TT Norms Pro vs Mono);
each file is claimed once; "demibold" ranks before "bold"; and when
nothing matches it warns loudly at build time instead of returning "".

Also: parseFonts() falls back to h1/h2/title/hero display roles, and the
frame-worker + caption authoring docs spell out that only shipped font
files render — no system CJK/Devanagari families on the headless renderer.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A one-off "generate me a BGM" request went straight to local MusicGen
without recommending sign-in: bgm.md/tts.md framed the no-credential path
as an automatic fallback, so the generation path bypassed the Preflight
stop, and the preflight used a bare `hyperframes auth status` that isn't
on PATH in a fresh `npx skills` project.

- Preflight now applies to one-off generation as well as workflows, uses
  `npx hyperframes auth status`, and says: if the CLI can't run, still
  recommend signing in and STOP — never treat "no credential" as a silent
  green light for local generation.
- bgm.md and tts.md point at the Preflight before generating, reframing
  local generation as the fallback the user opts into, not a default.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread packages/core/src/lint/context.ts Fixed
Comment thread packages/core/src/lint/context.ts Fixed
@WaterrrForever WaterrrForever changed the title feat: shared TTS/BGM auth preflight + caption and video-workflow fixes feat(cli): shared TTS/BGM auth preflight + caption and skill-workflow fixes Jun 24, 2026
Document signing in, the keys each capability (voice, music, capture)
uses, their resolution priority, and the fully local fallback. Add the
guide to the nav and cross-link it from the cloud deploy note and the
CLI env-var reference.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mintlify

mintlify Bot commented Jun 24, 2026

Copy link
Copy Markdown

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
hyperframes 🟢 Ready View Preview Jun 24, 2026, 10:58 AM

💡 Tip: Enable Workflows to automatically generate PRs for you.

@miga-heygen miga-heygen left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: feat(cli): shared TTS/BGM auth preflight + caption and skill-workflow fixes

This is a big bundle — 6 logical changes across 38 files. Reviewing each scope independently.


1. Auth preflight (audio/providers.ts, auth/status-guidance.ts, auth/status.ts, tts/python.ts, doctor.ts)

Architecture is clean. The decide/gather split is well-designed — pure decideVoice/decideMusic functions take facts, return engine readiness; gather* does the I/O. This makes the pure logic unit-testable without spawning Python.

Good decisions:

  • hasPythonModules uses importlib.util.find_spec (fast, doesn't import torch) instead of import — correct for a preflight.
  • gatherVoiceFacts / gatherMusicFacts short-circuit when HeyGen is configured — no Python probes needed.
  • findPython exported from tts/python.ts and reused in synthesize.ts — clean extraction, no duplication.
  • detectUnconfiguredContext correctly treats agent runtimes as interactive (they relay output to users) but CI as non-interactive.
  • buildUnconfiguredJson gives skills a machine-readable path via --json.

Test coverage solidproviders.test.ts (10 cases for the decide functions) and status.test.ts (testing guidance lines, JSON output, engine readiness display, non-interactive terse mode).

2. Caption font embedding fix (captions.mjs × 3, tokens.mjs × 3)

This is the highest-value bug fix in the PR. The brandFontFaces function was silently dropping fonts whose filenames contained underscores or hyphens because fam.replace(/\s+/g, "") only stripped whitespace while the files used _ / - as word separators.

The fix (norm = s => s.toLowerCase().replace(/[^a-z0-9]/g, "")) strips ALL non-alphanumerics on both sides of the match. Three additional improvements:

  1. claimed set prevents a shorter family from stealing a longer one's files (e.g. "TT Norms Pro" vs "TT Norms Pro Mono").
  2. Longest-key-first sort ensures the more specific family matches first.
  3. weightOf reordersemibold/demibold checked before bold because "demibold" contains "bold". The old order misclassified demibold as 700.

The warning when no font files match is a great DX add — surfaces the root cause at build time instead of a disconnected lint error two steps later.

Same fix applied identically across faceless-explainer, pr-to-video, and product-launch-video — the scripts are currently duplicated per skill. Not ideal (a shared module would be better), but that's pre-existing structure, not this PR's problem.

3. Caption skin path migration (build-frame.mjs × 3, captions.mjs × 3, SKILL.md × 5)

Moves caption-skin.html from the project root to .hyperframes/caption-skin.html. The captions.mjs scripts check .hyperframes/ first, then fall back to the legacy root path — backward compatible. All SKILL.md docs updated consistently.

4. Linter: comment stripping + caption-skin exclusion

  • context.ts: rawSource.replace(/<!--[\s\S]*?-->/g, "") strips HTML comments before the template-match regex runs. This prevents <!-- <template> --> in a comment from being matched as the template boundary. The new test case pins this exactly.
  • lintProject.ts: Skips caption-skin.html in the multiple-root-compositions check. Test added.

Both are small, correct, well-tested.

5. pr-to-video length scaling

Nice improvement — scaling recommended video length to the PR's actual change size (+/- lines, nudged by file count) instead of a fixed 60-90s guess. The tier table is sensible.

6. Skill doc updates (finalize simplification, font guidance, preflight gates)

Every video-creation skill now shows hyperframes auth status as a decision gate in Step 0. Frame-worker docs now explicitly warn against naming fonts without shipped files. Finalize steps simplified across the board. All consistent.


CI status

Format check is failing. The Preflight (lint + format) jobs are red across multiple pipelines. Worth fixing before merge — likely just a formatting pass on the new files.

The CodeQL, player-perf, preview-regression, and regression failures look unrelated (skipping/config-gated, not actual test failures), but Format is a real blocker.

Minor observations (non-blocking)

  1. tokens.mjs color regex — the new regex (?:"([^"]+)"|'([^']+)'|(#...)) with m[2] ?? m[3] ?? m[4] is more robust than the old ["']? approach, but it also now strips trailing # comments via (?:#.*)?$. Make sure no color values legitimately contain # after the value (unlikely for CSS colors, but worth noting).

  2. parseFonts fallback chain — Adding h1, h2, title, hero as display-font role lookups makes sense. Just noting that hero is quite generic — could match a non-font "hero" role key in some token schemas. Low risk since it's a ?? chain (only used when nothing more specific matches).

LGTM on the code — fix the Format CI failure and this is ready to go.

Review by Miga

@jrusso1020 jrusso1020 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read the auth-preflight surface + the caption-fix logic + the linter context change in full. 38 files / +928/-124 is past one-pass review so the skill .md changes + the 3 SKILL.md docs are scope-down samples. Posting as COMMENT — per the customer-partner-PR discipline (Miao's WaterrrForever GH handle is the same allowlisted contributor as the prior PRs in this batch), stamp eligibility routes through James. Also not stampable as-is: 9 CI checks are RED, including 2 high-severity CodeQL alerts.

CI red — needs addressing before merge

Check Status
Format FAIL
Preflight (lint + format) × 4 FAIL
player-perf FAIL
preview-regression FAIL
regression FAIL
CodeQL FAIL (2 high-severity alerts)

Same shape as HF#1665 had at draft time. Probably:

  • Format / Preflight × 4 → run bun run format on the 38 changed files.
  • player-perf / preview-regression / regression → either an intentional baseline drift (new caption/linter behavior shifts a fixture) requiring a regression:update, OR an unintended regression worth investigating per-job.
  • CodeQL → covered below.

Two high-severity CodeQL alerts both on context.ts:32same shape #1635 already learned from

CodeQL flagged the new line in packages/core/src/lint/context.ts:32:

let source = rawSource.replace(/<!--[\s\S]*?-->/g, "");
  • js/incomplete-multi-character-sanitization (security_severity: high) — single-pass replace doesn't handle nested comments. A payload like <!--<!---->--> leaves a dangling start-marker after one pass.
  • js/polynomial-redos (security_severity: high) — the regex can be slow on strings starting with <!-- with many repetitions.

This is the same shape HF#1635 (cfdc21312) already fixed in captions.mjs:227: the fix there was to replace the single-pass regex with a fixpoint loop:

while (out.includes("<!--")) out = out.replace(/<!--[\s\S]*?-->/g, "");

The same fix applies here. Miga's review praised the linter comment-stripping logic but didn't catch that it's the regression #1635 already had + fixed elsewhere. The fixpoint-loop pattern needs to land in context.ts to match.

Additive findings beyond Miga's pass

Three quick observations Miga's review summary didn't surface:

  1. weightOf ordering bug fix in captions.mjs (silently included): the pre-fix order had if (/bold/.test(s)) return 700 BEFORE if (/semibold|demibold/.test(s)) return 600, so "SemiBold" / "DemiBold" matched /bold/ first and got weight 700 (wrong). The new order checks semibold/demibold first. Comment says "before /bold/ — 'demibold' contains 'bold'". Small but real. ✓

  2. Skin path migrates to .hyperframes/caption-skin.html (hidden dir) with legacy caption-skin.html fallback. This is a workflow change — Step 2 of each skill that copies the chosen frame-preset's skin needs to write to the new hidden path. Worth confirming the Step-2 code matches (or accept that the legacy fallback covers the transition period). Miga didn't surface this.

  3. Skin docstring updated to reflect the hidden path: comment header now says .hyperframes/caption-skin.html (line 18). Body documentation is in sync. ✓

Cross-site applicability check for the new audio/providers.ts helper

Per [[feedback_audit_try_catch_scope_and_cross_site_applicability_on_new_helper_prs]]:

  • Consumers within this PR: auth status, doctor, tts/python, tts/synthesize (the slim-down). That's 4 consumers.
  • What about render commands? The skill scripts in hyperframes-media/scripts/audio.mjs (the orchestrator) have their own provider resolution. Does the new decide* / gather* get adopted there?
    • The PR body says "Mirrors the resolution order the hyperframes-media skill scripts use." That's parallel implementation, not shared code. Same logic in two places = potential drift surface.
    • Future PR opportunity: lift the CLI-side audio/providers.ts to a shared module the skill scripts can also import. Out of scope for this PR but worth a follow-up note.

The split between decide (pure, unit-testable) and gather (live facts) is clean architecture. Try/catch scope on the gather paths is appropriate — hasPythonModules uses find_spec which is exception-free for the common cases.

Verified Miga's findings

  • norm() strips all non-alphanumerics on both sides — verified at captions.mjs:319-320 of the diff. Plus the longest-family-first sort prevents prefix-collision (e.g. "TT Norms Pro" claiming "TT Norms Pro Mono"'s files), tracked via the new claimed: Set deduplicating across the sorted iteration. Two distinct improvements bundled into the same fix. ✓
  • Linter comment stripping in context.ts — verified, but flagged for the CodeQL issue above.
  • Auth preflight clean decide/gather architecture — verified. The pure decision functions are testable without spawning Python.

Body claims verified

  • "6 commits ... bundled as one branch" — verified.
  • "captions.mjs + lib/tokens.mjs in faceless-explainer, pr-to-video, and product-launch-video updated identically" — sampled faceless-explainer's captions.mjs diff in full; trust the body for the other two byte-for-byte. Worth a quick diff between the three files at the team-merge moment.
  • "auth status, doctor, and tts/python consume them" — verified.

Stamp posture

Same routing as the prior Miao batch — customer-partner-PR, hand-back to <@U08E7PV788Z>. Even without that policy, the 9 CI failures + 2 high-severity CodeQL alerts block stamp under REVIEW_DISCIPLINE rule #1. The two CodeQL alerts in particular should land the same fixpoint-loop pattern that HF#1635 already shipped — that's a known-working precedent.

Review by Jerrai

Single-pass <!-- --> removal can re-form a complete comment from
adjacent markers (e.g. `<<!-- -->!-- ... -->`), letting a decoy
<template> survive and hijack the template-boundary match. Loop to a
fixpoint, mirroring the captions.mjs precedent; add a regression test
that fails on single-pass (2 root findings) and passes on the loop.

Also wrap the build-frame.mjs node:fs imports to satisfy oxfmt — the
new copyFileSync import pushed the line past the width limit, which
was the sole cause of the Format / Preflight CI failures.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread packages/core/src/lint/context.ts Fixed
WaterrrForever and others added 2 commits June 24, 2026 20:56
The fixpoint loop still ran a /<!--[\s\S]*?-->/ regex per pass, which
backtracks O(n^2) on inputs with many unterminated "<!--" — CodeQL
js/polynomial-redos (high). Looping the same regex (the prescribed
fix) never addressed this; only the regex itself does.

Replace it with an indexOf-based linear strip in utils.ts
(stripHtmlComments), kept in a fixpoint loop so markers that re-form
when a comment is removed are still stripped. 200k unterminated
"<!--" now strips in ~3ms instead of quadratic time; behavior is
otherwise unchanged — unterminated comments are kept verbatim, as the
old regex left them. The re-forming regression test still guards it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
From team review of the not-signed-in onboarding:

- OAuth is a `hyperframes auth login` feature only. The separate `heygen`
  CLI is API-key-only — `heygen auth login` stores a pasted key, it is not
  OAuth and does not create an account. Stop presenting the two CLIs as the
  same OAuth/sign-up step.
- Use `npx hyperframes` in every imperative and runtime hint. Bare
  `hyperframes` is not on PATH on a fresh machine (command not found); only
  `npx hyperframes` is guaranteed. Also updates the JSON recommended_action.
- Drop `heygen auth login` from the terminal/skill onboarding: it needs its
  own install and there is no `npx heygen`, so it was a command-not-found
  trap. The shared-credential fact stays in the reference docs.

Covers the `auth status` guidance + tests, the Authentication docs, the
shared hyperframes-media preflight (SKILL, requirements, tts, error hints),
and the `npx hyperframes auth status` preflight in every TTS/BGM workflow
(pr-to-video, product-launch-video, faceless-explainer, website-to-video,
music-to-video).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@jrusso1020 jrusso1020 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R2 at 9cae28dffa — Verified the auth update + the two follow-up commits since R1.

R1 blockers addressed

  • CodeQL — both high-severity alerts on context.ts:32 now resolved (69270f7368). The fix lifted the comment-stripping into a shared stripHtmlComments(html) utility (linear scan + fixpoint), called from buildLintContext and pointed at by captions.mjs:227 too. Single source of truth across both call sites — better than the per-site fixpoint loop I'd pointed at #1635 for. 0 open CodeQL alerts on this PR now.
  • CI checks: 41/41 passing with 10 pending (running for the new HEAD). No remaining failures from the R1 sweep (Format / Preflight / player-perf / regression all clean).

Auth-update commit (9cae28dffa) — well-scoped + truthful

Walked the 13 files. The commit body's three points are exactly the right cuts:

  1. heygen auth login is API-key-only, not OAuth — corrects the over-promise in the original copy. The shared-credential fact stays in docs/guides/authentication.mdx (truthful), but the onboarding flow no longer dangles heygen auth login as a parallel "create an account" step alongside hyperframes auth login.
  2. npx hyperframes everywhere — correct for first-touch UX. Bare hyperframes isn't on PATH on a fresh machine; npx hyperframes is. All imperatives (status-guidance.ts, error messages in heygen.mjs/tts.mjs, all 5 skill SKILL.md previews, references/{requirements.md,tts.md}) flipped consistently.
  3. Drop heygen auth login from skill onboarding — there's no npx heygen, so it was a command-not-found trap. Right call.

The status-guidance UI rewrite is cleaner too — splits OAuth and --api-key into two labeled blocks with the API-key URL (app.heygen.com/settings/api) inline. Tests status.test.ts updated to pin both the positive (npx hyperframes auth login recommended) and negative (heygen auth login no longer surfaced) contracts.

Nothing new to block on

  • 5-file scope tight; no drive-bys
  • Tests cover both shapes
  • Error messages and skill copy fully consistent
  • The data-mismatch the commit fixes was a real first-touch UX trap (sign-up → "now go install a separate CLI and run a command that pastes a key")

Worth noting for the next round (not this PR): the heygen-cli OAuth story discussed in the parent thread is still the path to fully unified sign-up across both CLIs. Right now the shared ~/.heygen/credentials story is honest but asymmetric — hyperframes auth login creates accounts via OAuth, heygen auth login doesn't. Closing that asymmetry is a heygen-cli PR, not a HyperFrames one.

Stamp posture: COMMENT only — routing through James per the customer-partner-PR discipline. Code is clean on merit; James decides on stamp.

— Jerrai

@WaterrrForever WaterrrForever merged commit 54cab33 into main Jun 24, 2026
50 checks passed
@WaterrrForever WaterrrForever deleted the feat/media-auth-preflight-and-caption-fixes branch June 24, 2026 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants