docs(hcg-runbook): refresh rollout runbook v0.1→v0.2 after Phase D close (standards#100)#207
Merged
Merged
Conversation
…ards#91 / #100) Refreshes docs/integration/hcg-tier2-rollout-runbook.md from v0.1 (draft, 2026-05-20, pre Phase-D) to v0.2 reflecting the current state of the single-lane channel rooted at standards#91: - §1.1 Phase D deliverables: tick D-1..D-3 + D-4 bootstrap with http-capability-gateway PR refs (#12 / #14 / #22 / #26 / #30) and the boj-server D-1 load-profile (#168) that joint-closed standards#99 on 2026-06-01. The one remaining open item is the owner-driven perf-rebaseline workflow dispatch + `_status: scaffold-placeholder -> active` flip; called out explicitly rather than left as a stale unchecked checkbox. - §1.4 BoJ-side prereqs: tick the three loopback-bind layers (#130 / #131 / #132), the Phase C TrustPolicy clause (#106), the NetworkPolicy (#173), and the SSE-route policy coverage (#165). The Trustfile `tier_2_gateway.status: PENDING` line stays intentionally unchecked - it's the §6.4 last-action target. - §1.5 Gateway-side prereqs: tick the new `container/gateway-deploy.k9.ncl` from http-capability-gateway#38 (2026-06-03), record what stays PLACEHOLDER until cerro-torre signing runs, and expand the smoke-test entry with the concrete allow/deny sequence boj-server#165 deferred. - Header banner: replace the stale "Phase D has merged the scaffold only" Phase-D-dependency note with a current-state summary, bump version 0.1 -> 0.2, date 2026-05-20 -> 2026-06-08. - CHANGELOG.md: Documentation entry under [Unreleased] summarising the refresh. No code, infrastructure, or runtime behaviour changes. The runbook is the operator-facing source of truth for what's gating the next Phase E owner action; the drift it had was making "what's still open" harder to read at a glance. Refs hyperpolymath/standards#91 Refs hyperpolymath/standards#100 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
🔍 Hypatia Security ScanFindings: 272 issues detected
View findings[
{
"reason": "Stale AI session file -- delete",
"type": "stale",
"file": "GEMINI.md",
"action": "delete",
"rule_module": "root_hygiene",
"severity": "medium"
},
{
"reason": "Action if: always()\n uses: actions/upload-artifact@ea165f8 needs attention",
"type": "unpinned_action",
"file": "e2e.yml",
"action": "pin_sha",
"rule_module": "workflow_audit",
"severity": "medium"
},
{
"reason": "Action perpolymath/standards/.github/workflows/governance-reusable.yml@main\n needs attention",
"type": "unpinned_action",
"file": "governance.yml",
"action": "pin_sha",
"rule_module": "workflow_audit",
"severity": "medium"
},
{
"reason": "Issue in abi-drift.yml",
"type": "missing_timeout_minutes",
"file": "abi-drift.yml",
"action": "flag",
"rule_module": "workflow_audit",
"severity": "medium"
},
{
"reason": "Issue in codeql.yml",
"type": "missing_timeout_minutes",
"file": "codeql.yml",
"action": "flag",
"rule_module": "workflow_audit",
"severity": "medium"
},
{
"reason": "Issue in container-publish.yml",
"type": "missing_timeout_minutes",
"file": "container-publish.yml",
"action": "flag",
"rule_module": "workflow_audit",
"severity": "medium"
},
{
"reason": "Issue in dogfood-gate.yml",
"type": "missing_timeout_minutes",
"file": "dogfood-gate.yml",
"action": "flag",
"rule_module": "workflow_audit",
"severity": "medium"
},
{
"reason": "Issue in dogfood-gate.yml",
"type": "missing_timeout_minutes",
"file": "dogfood-gate.yml",
"action": "flag",
"rule_module": "workflow_audit",
"severity": "medium"
},
{
"reason": "Issue in dogfood-gate.yml",
"type": "missing_timeout_minutes",
"file": "dogfood-gate.yml",
"action": "flag",
"rule_module": "workflow_audit",
"severity": "medium"
},
{
"reason": "Issue in dogfood-gate.yml",
"type": "missing_timeout_minutes",
"file": "dogfood-gate.yml",
"action": "flag",
"rule_module": "workflow_audit",
"severity": "medium"
}
]Powered by Hypatia Neurosymbolic CI/CD Intelligence |
5 tasks
hyperpolymath
added a commit
that referenced
this pull request
Jun 20, 2026
…229) ## Summary Wires `scripts/hcg-surface-drift-check.sh` (landed in boj-server#228, merged 2026-06-19) into GitHub Actions, so the surface⊆policy invariant the ADR calls its largest declared risk is re-proven on every PR rather than relying on the manual re-verification stamp in `config/gateway-policy-boj.yaml`'s header. PR #228 explicitly flagged this CI wiring as the follow-up step — "a CI wiring PR should follow [the always-trigger + changes-job] pattern. Out of scope here." This is that follow-up; the script, the router, and the policy are unchanged. ## What lands A single new file: `.github/workflows/hcg-surface-drift.yml`. The workflow follows the boj-server "always-trigger + changes-job" pattern documented in `docs/wikis/CI-and-Required-Checks.adoc` and `.claude/CLAUDE.md` §"CI / Required Status Checks": - **No `on.*.paths`** — the check is always created. A path-filtered required workflow that never fires is the failure mode that stranded #213/#215 until #216 fixed it; this gate is built to never re-introduce it, regardless of whether it later joins `required_status_checks`. - **Lightweight `changes` job** recomputes relevance via `git diff origin/<base>...HEAD` against the four paths this gate cares about — router (`elixir/lib/boj_rest/router.ex`), live policy (`config/gateway-policy-boj.yaml`), the drift script (`scripts/hcg-surface-drift-check.sh`), and the workflow file itself. Fail-safe to `run=true` on any diff failure. - **Heavy `check` job** is `needs: changes` + `if: needs.changes.outputs.run == 'true'`. A skipped `if:` reports SUCCESS to any future required-context list, so unrelated PRs never pay for it and can never be blocked by it. - **Pinned action**, **timeout-minutes**, **concurrency group**, **`permissions: contents: read`**, **SPDX header** — matches the canonical pattern in `.github/workflows/abi-drift.yml`. The `check` job invokes the script with `bash scripts/hcg-surface-drift-check.sh -v` (matching the test plan in #228) so it works regardless of the script's file mode — #228 committed the script as 0644. ## What this PR does NOT do - **Does NOT** modify the runbook §1.5 ("Gateway-side prerequisites"). Adoption of the CI gate into the §1.5 checklist is a one-line owner-driven runbook update — the PR #228 deliberate boundary stays in place. - **Does NOT** add the new check to `.github/settings.yml`'s `required_status_checks` list (currently `hypatia-scan` + `codeql`). Promotion to required is a settings change for the owner to make once the gate has run green on a few PRs. - **Does NOT** modify the live policy, the example policy, the router, the script, or any other Phase E artefact. The change is wholly within `.github/workflows/`. - **Does NOT** pre-empt the §6.4 Trustfile flip (`tier_2_gateway.status` stays `PENDING`), the staging soak (§3.3), or cerro-torre `.ctp` signing — all of which remain owner-driven per the channel doctrine reaffirmed in #207 / #224. - Per the single-lane HCG channel discipline (pattern set in `http-capability-gateway` PRs #14, #22, #26, #30, #38 and `boj-server` PRs #168, #173, #224, #226, #228): joint-close is owner-only. **This PR refs but does not close `standards#100`.** ## Channel state note This session could not read `hyperpolymath/standards#91` / `#100` (the session's MCP repo scope is restricted to `http-capability-gateway` and `boj-server`), so the brief's instructed status comment on `standards#91` could not be posted. State was reconstructed from the canonical sources in this repo (ADR-0004, the integration plan, the audit, the rollout runbook, the live policy, `docs/wikis/CI-and-Required-Checks.adoc`) plus the merged-PR history of both in-scope repos. Analysis: Phase A/B/C/D closed; Phase E (`standards#100`) is the only open phase; #228 (2026-06-19) is the most recent advance and explicitly named this CI wiring as the next step. ## Test plan - [ ] **Required**: the `changes` job runs and emits `run=true` (because `.github/workflows/hcg-surface-drift.yml` matches the path regex), so the `check` job is gated through, not skipped, on this PR. - [ ] **Required**: the `check` job runs `bash scripts/hcg-surface-drift-check.sh -v` and exits 0 with the OK message — current `main` (64a70c5) has 7 wired routes, 28 policy rules, no drift; locally re-verified on this branch. - [ ] **Synthetic skip**: on a follow-up PR that touches none of the four watched paths, `changes.outputs.run` is `false` and `check` reports `skipped` (which counts as success for any required-context list). - [ ] **Synthetic drift**: a temporary PR adding `get "/__drift_test__"` to `elixir/lib/boj_rest/router.ex` without a matching policy rule fires `run=true`, `check` exits 1 with the route listed under `DRIFT:`, and the PR is blocked from merge if/when this gate is promoted to required. - [ ] No `actionlint` / Hypatia / SPDX gate fires on the new workflow file. Refs hyperpolymath/standards#91 Refs hyperpolymath/standards#100 🤖 Generated with [Claude Code](https://claude.com/claude-code) --- _Generated by [Claude Code](https://claude.ai/code/session_019cKmxx6AkNjzhXT6ZoxGfx)_ Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
7 tasks
hyperpolymath
added a commit
that referenced
this pull request
Jun 21, 2026
## Summary Adds `scripts/hcg-spec-coverage-check.sh`: a static, source-only audit that asserts every HTTP route declared in `docs/specification/openapi.yaml` is covered by at least one rule in the HCG live Verb Governance Spec (`config/gateway-policy-boj.yaml`). Companion / complement to PR #228's `hcg-surface-drift-check.sh`. The two scripts bracket the contract §8 declared-surface invariant from both directions: | Script | Invariant | Catches | |---|---|---| | `hcg-surface-drift-check.sh` (#228) | wired (router.ex) ⊆ policy | policy lag behind wiring | | `hcg-spec-coverage-check.sh` (this PR) | declared (openapi.yaml) ⊆ policy | policy lag behind the spec | Contract §8 (`docs/integration/http-capability-gateway-boj-contract.md`) is explicit: "the Verb Governance Spec governs the **declared** surface (openapi.yaml), not only the currently-wired subset. Declared-but-unimplemented routes are still classified in the policy so that when the gnosis handler grows them they are governed from day one rather than silently exposed." The live policy header carries the cross-check statement (*"Surface source: docs/specification/openapi.yaml, cross-checked against elixir/lib/boj_rest/router.ex"*); PR #228 made the router half machine-checkable, this PR makes the openapi half machine-checkable. Together they make the entire §1.5 re-verification stamp executable. Without this check the risk is concrete: someone adds a new path to `openapi.yaml` without a corresponding policy rule. The surface-drift check does not catch it (the route is not yet wired in `router.ex`). The day the route is wired, the surface-drift gate fires — but by then the operator has to either (a) ship the wiring with a default-deny in production for a route that should be live or (b) hold the wiring PR until the policy catches up. Catching the gap at spec-edit time avoids both, with no procedural cost above running the existing CI gate. ### What the script does 1. Extracts `(verb, path-template)` tuples from the `paths:` section of `docs/specification/openapi.yaml` — path entries at exactly 2-space indent, HTTP operations (get/post/put/delete/patch/head/options) at exactly 4-space indent under each path. Other keys at 4-space indent (parameters/summary/description/tags/...) are metadata, not operations, and are skipped. 2. Extracts `(verb, path-pattern)` tuples from `config/gateway-policy-boj.yaml` using the identical extraction block that `hcg-surface-drift-check.sh` uses, so the two scripts cannot drift in how they read the policy. 3. For each declared route, concretises `{name}`-style placeholders with a known probe segment (`probe`, shared with the smoke + surface-drift scripts so a future regex tightening fails all three in lock-step) and asserts at least one policy rule covers it: literal equality for non-regex paths; ERE `grep -E` match against the concrete URL for `^…` regex paths. The declared verb must be in the policy rule's verb list. 4. Exit `0` on no gap, `1` on gap detected, `64` on bad usage. ### What this PR does NOT do - Does **not** modify the rollout runbook §1.5 or the contract §8. Adoption as the §1.5 declared-surface check is a separate, owner-driven PR; this PR lands the artefact only so the runbook update is a one-line wiring change. Matches the §228-then-runbook split. - Does **not** wire the script into CI. Boj-server's CI discipline (`docs/wikis/CI-and-Required-Checks.adoc` / `.claude/CLAUDE.md`) requires path-filtered required checks to use the "always-trigger + changes job" pattern; a CI wiring PR should follow that pattern, matching the #228 → #229 split. Out of scope here. - Does **not** modify the openapi.yaml or the policy. On this branch the script reports OK against today's surface — every one of the 26 `(verb, path)` pairs declared in openapi.yaml has a matching rule among the 28 `(verb, path)` rules in the live policy. The 2-rule surplus is the policy's coverage of routes the openapi.yaml does not declare (notably `/.well-known/boj-node-pubkey`, which the router wires but the spec does not yet enumerate); the script intentionally does not penalise that direction — see the script's `Limitations` header. - Does **not** pre-empt the §6.4 Trustfile flip (`tier_2_gateway.status` stays `PENDING`). - Per single-lane HCG channel discipline (pattern set in `http-capability-gateway` PRs #10, #11, #12, #14, #22, #26, #30, #38 and `boj-server` PRs #78, #90, #106, #168, #173, #207, #208, #210, #215, #222, #224, #226, #228, #229): joint-close is owner-only. **This PR refs but does not close `standards#100`.** ### Channel state note This session could not read `hyperpolymath/standards#91` / `#100` (the session's repository scope is restricted to `http-capability-gateway` and `boj-server`), so the brief's instructed status comment on `standards#91` could not be posted. State was reconstructed from the canonical sources in this repo (ADR-0004, the integration plan, the audit, the rollout runbook, the live policy, the openapi spec, and the merged-PR commit history) plus the current `main` of both in-scope repos. The analysis: Phase A/B/C/D are closed (artefacts merged, runbook §1.2 and the Phase-D status note in the runbook header confirm); Phase E (`standards#100`) is the only open phase; all remaining §1 checklist items are owner-driven (`!OWNER:` placeholders, D-4 rebaseline `workflow_dispatch`, cerro-torre `.ctp` signing, the §6.4 Trustfile flip). This PR advances Phase E §1.5 ("Gateway-side prerequisites") by converting one half of the declared-surface invariant into an executable artefact, mirroring exactly the script-first split of #228. ## Test plan - [ ] Run the script on this branch's working tree: `bash scripts/hcg-spec-coverage-check.sh` — expect exit `0`, "OK: every openapi-declared route is covered by at least one policy rule." with `Declared (openapi) routes: 26` and `Policy (verb,path) rules: 28`. - [ ] Run `bash scripts/hcg-spec-coverage-check.sh -v` — expect the same exit `0` plus a `Matched:` block listing each of the 26 declared routes against its policy rule (literal `/health` → literal rule; `/cartridge/{name}/invoke` → `^/cartridge/[A-Za-z0-9_.-]+/invoke$` regex; `/grpc/{service}/{method}` → two-segment regex; `/umoja/peers` matches both `GET` and `POST` rules; etc.). - [ ] Synthetic gap test: build a temporary openapi.yaml containing a single declared path with no policy rule and run `OPENAPI_FILE=... bash scripts/hcg-spec-coverage-check.sh` — expect exit `1` with the route listed under `GAP:`. (Verified locally on this branch.) - [ ] Confirm `shellcheck scripts/hcg-spec-coverage-check.sh` produces only the same `SC1001` info note that `scripts/hcg-surface-drift-check.sh` produces today (the `\^` escape inside a `case` pattern is intentional and matches the sibling script's posture exactly). - [ ] Confirm SPDX header + Owner copyright match the canonical estate format (matches `scripts/hcg-surface-drift-check.sh`'s header shape). - [ ] Verify `scripts/check-shebang-first.sh` is still green with the new file present. - [ ] Verify no Hypatia / governance / spdx gates fire on the new script file. Refs hyperpolymath/standards#91 Refs hyperpolymath/standards#100 🤖 Generated with [Claude Code](https://claude.com/claude-code) --- _Generated by [Claude Code](https://claude.ai/code/session_013VLPKSTEMFnPYQdx6rD91b)_ Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Refreshes
docs/integration/hcg-tier2-rollout-runbook.mdfrom v0.1 (draft, 2026-05-20, pre Phase-D) to v0.2 reflecting the current state of the single-lane HCG tier-2 channel rooted atstandards#91. Documentation-only PR — no code, no infrastructure, no behaviour change. The runbook had visible drift across three sub-sections of §1 (Prerequisites) since Phase D (standards#99) closed on 2026-06-01 and gateway E1 (deploy spec) landed on 2026-06-03; this PR ticks the boxes that have evidence and calls out exactly what's still open.Refs hyperpolymath/standards#91Refs hyperpolymath/standards#100Channel position
What changed
docs/integration/hcg-tier2-rollout-runbook.mdHeader banner.
0.1 (draft, Phase E first cut)→0.2 (post Phase-D close, Phase E in-progress).2026-05-20→2026-06-08 (rev. from 2026-05-20)._statusflip). The old admonition still claimed "Phase D has merged the scaffold only" — six weeks stale.§1.1 Phase D deliverables landed.
[ ]D-2,[ ]D-3,[ ]D-4 →[x]lines per phase with PR refs and dates:workflow_dispatchrebaseline onubuntu-latest) — http-capability-gateway#26 (2026-05-30).standards#99.[ ]line for the remaining open item only: owner-driven dispatch ofPerf Rebaseline+ maintainer-merge of the generatedperf: rebaseline (standards#99)PR +_status: scaffold-placeholder → activeflip. Until this lands the gate runs in non-blocking scaffold mode.§1.4 BoJ-side prerequisites.
[ ]Loopback bind →[x]enumerating the three layers that landed: Elixir Cowboy bind tightening (boj-server#130), k8s Service ClusterIP (boj-server#131), Zig-adapterAPP_HOST=127.0.0.1acrossstapeln.toml,entrypoint.sh,compose.prod.yaml(boj-server#132). Deployment-time confirmation that the staging port really is closed at the network layer stays an operator pre-check before §2.1.[ ]TrustPolicy clause →[x]with the verified line reference (elixir/lib/boj_rest/trust_policy.ex:73) and PR (boj-server#106).[x]entries for the two additional Phase-E-supporting BoJ-side landings flagged in gateway#38's channel position:Trustfile.a2ml tier_2_gateway.status: PENDINGline stays intentionally unchecked — it's the §6.4 last-action target.§1.5 Gateway-side prerequisites.
[ ]container/gateway-deploy.k9.nclexists →[x]with PR ref (http-capability-gateway#38, 2026-06-03), naming the five-level k9-svc pedigree (Snout / Scent / Leash / Gut / Muscle), per-environmentBACKEND_URL, trust-source flip pattern ("header"staging →"mtls"production after §2.4 rehearsal),max_unavailable = 0, andfailure_mode = "fail-closed"matching the[SEAMS] gateway-boj-gnosisdeclaration.[ ]Containerfile +.ctpsigning entry extended with a note thatpedigree.security.signature+pedigree.validation.checksumstayPLACEHOLDERin the k9.ncl until cerro-torre signing runs (separate operator action, key-handling discipline).cartridge-sse-postrule.POST /cartridge/:name/sseX-Trust-Level cases).CHANGELOG.mdNew
### Documentationentry under[Unreleased]summarising the refresh and pointing at the PR refs. Sits with the existing Phase E### Addedentry for the NetworkPolicy (#173) and the prior loopback-bind entries.What this PR deliberately does NOT do
Trustfile.a2ml tier_2_gateway.status. That's the §6.4 last action; flipping it before the soak windows are complete would mis-represent the deployment state.standards#100. Same channel discipline as PR chore(deps): bump nixpkgs from01fbdeeto6368eda#38 / docs(hcg-load-profile): Phase D D1 — load profile declaration (standards#99) #168 — single-lane joint-close, owner-only.gh pr viewon a merged-and-confirmed PR (verified via the GitHub MCP at preparation time).Test plan
Governance/Secret Scanner/Dogfood Gate/LSP/DAP/BSP CI/Hypatia Security Scan/CodeQL Security Analysisall pass — they're green onorigin/mainand this is a docs-only diff that touches no workflow input.OpenSSF Scorecard Enforcer,Scorecards supply-chain security,Instant Sync) — known failures onorigin/mainunrelated to this PR. Same status before and after.01fbdeeto6368eda#38, confirm each is the cited PR.grep -n 'tier_2_gateway:' .machine_readable/contractiles/trust/Trustfile.a2ml→897: tier_2_gateway:so thestatus:field is line 900). Confirmed at PR-creation time.🤖 Generated with Claude Code
Generated by Claude Code