feat(ci): make BM Bossbot a deterministic merge gate#942
Closed
phernandez wants to merge 10 commits into
Closed
Conversation
…e PR images on PR content Two BM Bossbot fixes: 1. Unresolved review threads now block approval. The review prompt only sees metadata+diff, so an approve verdict said nothing about outstanding feedback — #932 merged with two open Codex P2 threads. finalize now counts unresolved reviewThreads via GraphQL and fails the status when any remain. A new recheck command, triggered by pull_request_review / review_comment / review_thread events, flips the status to failure when feedback arrives after approval and restores a previously earned approval for the same head SHA once every thread is resolved. Review and recheck jobs use separate job-level concurrency groups so neither cancels the other. 2. PR images depict the PR's content, not the review outcome. The image prompt previously used the BM Bossbot review summary (verdict/status) as its only source material, so every image rendered an APPROVED stamp. The prompt now sources the PR title and the author's own description (managed bot blocks stripped) and explicitly bans approval stamps, verdict wording, badges, checkmarks, and Bossbot branding. Theme selection seeds on PR number+title for stability across re-reviews. Signed-off-by: phernandez <paul@basicmachines.co>
Signed-off-by: phernandez <paul@basicmachines.co>
376f297 to
f9c3baf
Compare
…ption Mirror the ProjectUpdateContext shape the basic-memory.yml capture flow collects: the image prompt now receives a compact change-shape digest — labels, linked issues with titles, commit subjects (the PR's narrative arc), and a churn-ranked changed-files summary with totals — from a single 'gh pr view --json' call passed as --pr-context-file (replacing --pr-title/--pr-body-file). The digest is explicitly context, not captions: the prompt forbids rendering paths, stats, issue numbers, or commit subjects verbatim in the image. Signed-off-by: phernandez <paul@basicmachines.co>
9e99e8b to
ef6c47e
Compare
The guard banned any github.event.pull_request reference, but the recheck job legitimately needs the numeric PR id. Allow exactly .number and keep forbidding head checkout and attacker-controlled string fields. Signed-off-by: phernandez <paul@basicmachines.co>
The recheck path posts a status per review-thread event, so a busy PR can exceed one page of statuses and the original approval record falls off page one — resolving the last thread would then never restore it. Page through all statuses for the BM Bossbot context. Signed-off-by: phernandez <paul@basicmachines.co>
Remove the Codex LLM review and the per-PR image job — both spent API tokens on every Bossbot run and the connector review already covers code review. The gate is now fully deterministic: Tests passed for the head SHA, non-draft, trusted author, zero unresolved review threads. Prompt/schema files removed; guard tests updated to forbid openai/codex-action and image generation from reappearing. Signed-off-by: phernandez <paul@basicmachines.co>
Unit legs intermittently hang mid-suite (FastMCP/asyncpg cleanup-hang family) and sit until the runner gives up, eating 20+ minutes per occurrence. pytest-timeout turns a hang into a fast failure with a stack dump naming the test. Signed-off-by: phernandez <paul@basicmachines.co> (cherry picked from commit 0295cd9)
'page=1' substring-matched 'per_page=100', so the fake served page 1 forever and the pagination loop hung — caught by the new 120s test timeout on every CI leg. Signed-off-by: phernandez <paul@basicmachines.co>
The batch-indexing race has now flaked three CI rounds today. Skipped under CI only (still runs locally); #940 tracks the root cause. Signed-off-by: phernandez <paul@basicmachines.co>
Member
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two BM Bossbot fixes, both from today's incident review:
1. Unresolved review threads now block approval
PR #932 was merged with two unresolved Codex P2 review threads because nothing in the gate looks at review feedback: Bossbot's review prompt sees only metadata+diff, Codex connector reviews are
COMMENTED-state (never flipreviewDecision), and the approval status is keyed to a head SHA with no re-evaluation when comments arrive.finalizenow counts unresolvedreviewThreadsvia GraphQL (paginated) and fails the BM Bossbot Approval status when any remain — even on anapproveverdict.recheckcommand + workflow job triggered bypull_request_review,pull_request_review_comment, andpull_request_review_threadevents: new feedback flips the status to failure; resolving the last thread restores a previously earned approval for the same head SHA (approval history is read from commit statuses, so thread resolution alone can never upgrade a pending/failed review).2. PR images depict the PR's content, not the review outcome
Every generated image was an "BM BOSSBOT APPROVED" stamp because the image prompt's only source material was the review summary block (verdict/status). The prompt now:
The assets workflow step now passes
--pr-title.Test plan
uv run pytest tests/scripts/ -q— 30 passed (new coverage: thread-blocking finalize, GraphQL pagination, all three recheck outcomes, approval-record matching, content extraction, verdict-free prompt)uv run ty check tests/scripts/clean; workflow YAML validatedRefs #932
🤖 Generated with Claude Code
Reviewed SHA:
513fef79c5da67d7edc187329e66c17712b57103Verdict:
changes_requestedStatus:
failure- BM Bossbot requested changesSummary:
Reviewed the provided diff. The deterministic gate direction is mostly coherent, but the automatic restore path for resolved review threads depends on an unsupported Actions trigger, so the required status can remain failed after all feedback is resolved.
Blocking findings:
pull_request_review_threadwithresolved/unresolvedtypes, but GitHub Actions' documented trigger list includespull_request_reviewandpull_request_review_comment, notpull_request_review_thread. Resolving the last review thread therefore will not automatically run therecheckjob, leaving the requiredBM Bossbot Approvalstatus failed until someone manually reruns Bossbot. Use a supported trigger or another deterministic polling/status refresh path for thread resolution before relying on this as a merge gate.Non-blocking findings: