feat(ci): make BM Bossbot a deterministic merge gate by phernandez · Pull Request #942 · basicmachines-co/basic-memory

phernandez · 2026-06-10T04:02:44Z

Summary

Two BM Bossbot fixes, both from today's incident review:

1. Unresolved review threads now block approval

PR #932 was merged with two unresolved Codex P2 review threads because nothing in the gate looks at review feedback: Bossbot's review prompt sees only metadata+diff, Codex connector reviews are COMMENTED-state (never flip reviewDecision), and the approval status is keyed to a head SHA with no re-evaluation when comments arrive.

finalize now counts unresolved reviewThreads via GraphQL (paginated) and fails the BM Bossbot Approval status when any remain — even on an approve verdict.
New recheck command + workflow job triggered by pull_request_review, pull_request_review_comment, and pull_request_review_thread events: new feedback flips the status to failure; resolving the last thread restores a previously earned approval for the same head SHA (approval history is read from commit statuses, so thread resolution alone can never upgrade a pending/failed review).
Review and recheck use separate job-level concurrency groups so thread events can't cancel an in-flight review (workflow-level concurrency previously made cross-cancellation possible).

2. PR images depict the PR's content, not the review outcome

Every generated image was an "BM BOSSBOT APPROVED" stamp because the image prompt's only source material was the review summary block (verdict/status). The prompt now:

sources the PR title + the author's own description (all managed bot blocks stripped) so the image expresses the theme of the whole change
explicitly bans approval stamps, APPROVED/SUCCESS/verdict wording, badges, checkmarks, SHA strings, and Bossbot branding
seeds auto-theme selection on PR number+title (stable across re-reviews)

The assets workflow step now passes --pr-title.

Test plan

uv run pytest tests/scripts/ -q — 30 passed (new coverage: thread-blocking finalize, GraphQL pagination, all three recheck outcomes, approval-record matching, content extraction, verdict-free prompt)
uv run ty check tests/scripts/ clean; workflow YAML validated

Refs #932

🤖 Generated with Claude Code

Reviewed SHA: 513fef79c5da67d7edc187329e66c17712b57103
Verdict: changes_requested
Status: failure - BM Bossbot requested changes

Summary:
Reviewed the provided diff. The deterministic gate direction is mostly coherent, but the automatic restore path for resolved review threads depends on an unsupported Actions trigger, so the required status can remain failed after all feedback is resolved.

Blocking findings:

Thread-resolution recheck relies on a non-triggering Actions event: .github/workflows/bm-bossbot.yml:22 adds pull_request_review_thread with resolved/unresolved types, but GitHub Actions' documented trigger list includes pull_request_review and pull_request_review_comment, not pull_request_review_thread. Resolving the last review thread therefore will not automatically run the recheck job, leaving the required BM Bossbot Approval status failed until someone manually reruns Bossbot. Use a supported trigger or another deterministic polling/status refresh path for thread resolution before relying on this as a merge gate.

Non-blocking findings:

CI-only quarantine weakens sync regression coverage: tests/sync/test_sync_service.py skips the circular-relations sync test under CI. The tracking note for Timing-sensitive tests exposed by faster test suite (post-#938): circular-relations batch indexing, mcp sse routing #940 helps, but this removes CI coverage for exactly the unresolved batch-indexing race until the root cause lands. Keep the quarantine short-lived and restore CI coverage with the fix.

…e PR images on PR content Two BM Bossbot fixes: 1. Unresolved review threads now block approval. The review prompt only sees metadata+diff, so an approve verdict said nothing about outstanding feedback — #932 merged with two open Codex P2 threads. finalize now counts unresolved reviewThreads via GraphQL and fails the status when any remain. A new recheck command, triggered by pull_request_review / review_comment / review_thread events, flips the status to failure when feedback arrives after approval and restores a previously earned approval for the same head SHA once every thread is resolved. Review and recheck jobs use separate job-level concurrency groups so neither cancels the other. 2. PR images depict the PR's content, not the review outcome. The image prompt previously used the BM Bossbot review summary (verdict/status) as its only source material, so every image rendered an APPROVED stamp. The prompt now sources the PR title and the author's own description (managed bot blocks stripped) and explicitly bans approval stamps, verdict wording, badges, checkmarks, and Bossbot branding. Theme selection seeds on PR number+title for stability across re-reviews. Signed-off-by: phernandez <paul@basicmachines.co>

Signed-off-by: phernandez <paul@basicmachines.co>

…ption Mirror the ProjectUpdateContext shape the basic-memory.yml capture flow collects: the image prompt now receives a compact change-shape digest — labels, linked issues with titles, commit subjects (the PR's narrative arc), and a churn-ranked changed-files summary with totals — from a single 'gh pr view --json' call passed as --pr-context-file (replacing --pr-title/--pr-body-file). The digest is explicitly context, not captions: the prompt forbids rendering paths, stats, issue numbers, or commit subjects verbatim in the image. Signed-off-by: phernandez <paul@basicmachines.co>

The guard banned any github.event.pull_request reference, but the recheck job legitimately needs the numeric PR id. Allow exactly .number and keep forbidding head checkout and attacker-controlled string fields. Signed-off-by: phernandez <paul@basicmachines.co>

The recheck path posts a status per review-thread event, so a busy PR can exceed one page of statuses and the original approval record falls off page one — resolving the last thread would then never restore it. Page through all statuses for the BM Bossbot context. Signed-off-by: phernandez <paul@basicmachines.co>

Remove the Codex LLM review and the per-PR image job — both spent API tokens on every Bossbot run and the connector review already covers code review. The gate is now fully deterministic: Tests passed for the head SHA, non-draft, trusted author, zero unresolved review threads. Prompt/schema files removed; guard tests updated to forbid openai/codex-action and image generation from reappearing. Signed-off-by: phernandez <paul@basicmachines.co>

Unit legs intermittently hang mid-suite (FastMCP/asyncpg cleanup-hang family) and sit until the runner gives up, eating 20+ minutes per occurrence. pytest-timeout turns a hang into a fast failure with a stack dump naming the test. Signed-off-by: phernandez <paul@basicmachines.co> (cherry picked from commit 0295cd9)

'page=1' substring-matched 'per_page=100', so the fake served page 1 forever and the pagination loop hung — caught by the new 120s test timeout on every CI leg. Signed-off-by: phernandez <paul@basicmachines.co>

The batch-indexing race has now flaked three CI rounds today. Skipped under CI only (still runs locally); #940 tracks the root cause. Signed-off-by: phernandez <paul@basicmachines.co>

phernandez · 2026-06-10T19:03:58Z

Superseded: BM Bossbot is being removed entirely (workflow disabled, required check dropped from the ruleset, files deleted via #945). The salvageable pieces of this branch — pytest-timeout and the #940 quarantine — are carried by #945.

phernandez added 2 commits June 9, 2026 23:02

test(ci): satisfy ty narrowing in graphql pagination fake

f9c3baf

Signed-off-by: phernandez <paul@basicmachines.co>

phernandez force-pushed the fix/bossbot-thread-gate branch from 376f297 to f9c3baf Compare June 10, 2026 04:04

phernandez force-pushed the fix/bossbot-thread-gate branch from 9e99e8b to ef6c47e Compare June 10, 2026 04:12

phernandez added 3 commits June 9, 2026 23:31

phernandez changed the title ~~fix(ci): block BM Bossbot approval on unresolved review threads~~ feat(ci): make BM Bossbot a deterministic merge gate Jun 10, 2026

phernandez and others added 4 commits June 10, 2026 09:44

Merge branch 'main' into fix/bossbot-thread-gate

e81cdc2

fix(ci): anchor paged-status test fake on the &page separator

afa694b

'page=1' substring-matched 'per_page=100', so the fake served page 1 forever and the pagination loop hung — caught by the new 120s test timeout on every CI leg. Signed-off-by: phernandez <paul@basicmachines.co>

test(sync): quarantine circular-relations test in CI pending #940

513fef7

The batch-indexing race has now flaked three CI rounds today. Skipped under CI only (still runs locally); #940 tracks the root cause. Signed-off-by: phernandez <paul@basicmachines.co>

phernandez closed this Jun 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ci): make BM Bossbot a deterministic merge gate#942

feat(ci): make BM Bossbot a deterministic merge gate#942
phernandez wants to merge 10 commits into
mainfrom
fix/bossbot-thread-gate

phernandez commented Jun 10, 2026 •

edited by github-actions Bot

Loading

Uh oh!

phernandez commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

phernandez commented Jun 10, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. Unresolved review threads now block approval

2. PR images depict the PR's content, not the review outcome

Test plan

Uh oh!

phernandez commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

phernandez commented Jun 10, 2026 •

edited by github-actions Bot

Loading