Skip to content

feat(ci): make BM Bossbot a deterministic merge gate#942

Closed
phernandez wants to merge 10 commits into
mainfrom
fix/bossbot-thread-gate
Closed

feat(ci): make BM Bossbot a deterministic merge gate#942
phernandez wants to merge 10 commits into
mainfrom
fix/bossbot-thread-gate

Conversation

@phernandez

@phernandez phernandez commented Jun 10, 2026

Copy link
Copy Markdown
Member

Summary

Two BM Bossbot fixes, both from today's incident review:

1. Unresolved review threads now block approval

PR #932 was merged with two unresolved Codex P2 review threads because nothing in the gate looks at review feedback: Bossbot's review prompt sees only metadata+diff, Codex connector reviews are COMMENTED-state (never flip reviewDecision), and the approval status is keyed to a head SHA with no re-evaluation when comments arrive.

  • finalize now counts unresolved reviewThreads via GraphQL (paginated) and fails the BM Bossbot Approval status when any remain — even on an approve verdict.
  • New recheck command + workflow job triggered by pull_request_review, pull_request_review_comment, and pull_request_review_thread events: new feedback flips the status to failure; resolving the last thread restores a previously earned approval for the same head SHA (approval history is read from commit statuses, so thread resolution alone can never upgrade a pending/failed review).
  • Review and recheck use separate job-level concurrency groups so thread events can't cancel an in-flight review (workflow-level concurrency previously made cross-cancellation possible).

2. PR images depict the PR's content, not the review outcome

Every generated image was an "BM BOSSBOT APPROVED" stamp because the image prompt's only source material was the review summary block (verdict/status). The prompt now:

  • sources the PR title + the author's own description (all managed bot blocks stripped) so the image expresses the theme of the whole change
  • explicitly bans approval stamps, APPROVED/SUCCESS/verdict wording, badges, checkmarks, SHA strings, and Bossbot branding
  • seeds auto-theme selection on PR number+title (stable across re-reviews)

The assets workflow step now passes --pr-title.

Test plan

  • uv run pytest tests/scripts/ -q — 30 passed (new coverage: thread-blocking finalize, GraphQL pagination, all three recheck outcomes, approval-record matching, content extraction, verdict-free prompt)
  • uv run ty check tests/scripts/ clean; workflow YAML validated

Refs #932

🤖 Generated with Claude Code

Reviewed SHA: 513fef79c5da67d7edc187329e66c17712b57103
Verdict: changes_requested
Status: failure - BM Bossbot requested changes

Summary:
Reviewed the provided diff. The deterministic gate direction is mostly coherent, but the automatic restore path for resolved review threads depends on an unsupported Actions trigger, so the required status can remain failed after all feedback is resolved.

Blocking findings:

  • Thread-resolution recheck relies on a non-triggering Actions event: .github/workflows/bm-bossbot.yml:22 adds pull_request_review_thread with resolved/unresolved types, but GitHub Actions' documented trigger list includes pull_request_review and pull_request_review_comment, not pull_request_review_thread. Resolving the last review thread therefore will not automatically run the recheck job, leaving the required BM Bossbot Approval status failed until someone manually reruns Bossbot. Use a supported trigger or another deterministic polling/status refresh path for thread resolution before relying on this as a merge gate.

Non-blocking findings:

…e PR images on PR content

Two BM Bossbot fixes:

1. Unresolved review threads now block approval. The review prompt only
   sees metadata+diff, so an approve verdict said nothing about
   outstanding feedback — #932 merged with two open Codex P2 threads.
   finalize now counts unresolved reviewThreads via GraphQL and fails
   the status when any remain. A new recheck command, triggered by
   pull_request_review / review_comment / review_thread events, flips
   the status to failure when feedback arrives after approval and
   restores a previously earned approval for the same head SHA once
   every thread is resolved. Review and recheck jobs use separate
   job-level concurrency groups so neither cancels the other.

2. PR images depict the PR's content, not the review outcome. The image
   prompt previously used the BM Bossbot review summary (verdict/status)
   as its only source material, so every image rendered an APPROVED
   stamp. The prompt now sources the PR title and the author's own
   description (managed bot blocks stripped) and explicitly bans
   approval stamps, verdict wording, badges, checkmarks, and Bossbot
   branding. Theme selection seeds on PR number+title for stability
   across re-reviews.

Signed-off-by: phernandez <paul@basicmachines.co>
Signed-off-by: phernandez <paul@basicmachines.co>
@phernandez phernandez force-pushed the fix/bossbot-thread-gate branch from 376f297 to f9c3baf Compare June 10, 2026 04:04
…ption

Mirror the ProjectUpdateContext shape the basic-memory.yml capture flow
collects: the image prompt now receives a compact change-shape digest —
labels, linked issues with titles, commit subjects (the PR's narrative
arc), and a churn-ranked changed-files summary with totals — from a
single 'gh pr view --json' call passed as --pr-context-file (replacing
--pr-title/--pr-body-file). The digest is explicitly context, not
captions: the prompt forbids rendering paths, stats, issue numbers, or
commit subjects verbatim in the image.

Signed-off-by: phernandez <paul@basicmachines.co>
@phernandez phernandez force-pushed the fix/bossbot-thread-gate branch from 9e99e8b to ef6c47e Compare June 10, 2026 04:12
The guard banned any github.event.pull_request reference, but the
recheck job legitimately needs the numeric PR id. Allow exactly
.number and keep forbidding head checkout and attacker-controlled
string fields.

Signed-off-by: phernandez <paul@basicmachines.co>
The recheck path posts a status per review-thread event, so a busy PR
can exceed one page of statuses and the original approval record falls
off page one — resolving the last thread would then never restore it.
Page through all statuses for the BM Bossbot context.

Signed-off-by: phernandez <paul@basicmachines.co>
Remove the Codex LLM review and the per-PR image job — both spent API
tokens on every Bossbot run and the connector review already covers
code review. The gate is now fully deterministic: Tests passed for the
head SHA, non-draft, trusted author, zero unresolved review threads.
Prompt/schema files removed; guard tests updated to forbid
openai/codex-action and image generation from reappearing.

Signed-off-by: phernandez <paul@basicmachines.co>
@phernandez phernandez changed the title fix(ci): block BM Bossbot approval on unresolved review threads feat(ci): make BM Bossbot a deterministic merge gate Jun 10, 2026
phernandez and others added 4 commits June 10, 2026 09:44
Unit legs intermittently hang mid-suite (FastMCP/asyncpg cleanup-hang
family) and sit until the runner gives up, eating 20+ minutes per
occurrence. pytest-timeout turns a hang into a fast failure with a
stack dump naming the test.

Signed-off-by: phernandez <paul@basicmachines.co>
(cherry picked from commit 0295cd9)
'page=1' substring-matched 'per_page=100', so the fake served page 1
forever and the pagination loop hung — caught by the new 120s test
timeout on every CI leg.

Signed-off-by: phernandez <paul@basicmachines.co>
The batch-indexing race has now flaked three CI rounds today. Skipped
under CI only (still runs locally); #940 tracks the root cause.

Signed-off-by: phernandez <paul@basicmachines.co>
@phernandez

Copy link
Copy Markdown
Member Author

Superseded: BM Bossbot is being removed entirely (workflow disabled, required check dropped from the ruleset, files deleted via #945). The salvageable pieces of this branch — pytest-timeout and the #940 quarantine — are carried by #945.

@phernandez phernandez closed this Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant