perf(ci): select impacted tests via testmon on PR builds#945
Conversation
The testmon cache (branch-scoped, falling back to main's baseline) has been in place since #928, but the flags pinned --testmon-noselect, so every PR build recorded data and still ran the full suite. Flip PR builds to --testmon --testmon-forceselect like basic-memory-cloud; pushes to main keep --testmon-noselect to refresh the baseline. Signed-off-by: phernandez <paul@basicmachines.co>
tests/scripts and tests/ci exercise bossbot scripts and workflow guards — pure CI tooling. They were running on all unit matrix legs (3 Pythons x 2 backends x 3 OSes). Move them to a single test-ci-tooling run inside the Static Checks job. Signed-off-by: phernandez <paul@basicmachines.co>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 99ea2514dd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
CI tooling doesn't need product-suite tests burning time on every matrix leg. The bossbot status script is exercised by every PR run. Signed-off-by: phernandez <paul@basicmachines.co>
Unit legs intermittently hang mid-suite (FastMCP/asyncpg cleanup-hang family) and sit until the runner gives up, eating 20+ minutes per occurrence. pytest-timeout turns a hang into a fast failure with a stack dump naming the test. Signed-off-by: phernandez <paul@basicmachines.co>
The Postgres unit suite is the CI long pole. pytest-split divides the collection across a group matrix axis (3 shards x 3 Pythons), each shard a full job with its own Postgres service. Exit code 5 is treated as success in the recipe because a testmon-selected PR build can leave a shard empty. Testmon cache keys gain the shard group. Signed-off-by: phernandez <paul@basicmachines.co>
SQLite jobs carry the Python-version matrix; Postgres jobs carry backend coverage on 3.14 only. Postgres unit: 3 shards x 1 Python instead of 9 jobs; Postgres integration: 1 job instead of 3. Signed-off-by: phernandez <paul@basicmachines.co>
GitHub-hosted runners are free for public repos; Depot bills per minute. With testmon-selected PR builds, sharded Postgres units, and the semantic-search fixture fix, Depot's speed premium no longer justifies the spend. Signed-off-by: phernandez <paul@basicmachines.co>
…aseline A full-run .testmondata is a valid superset baseline for any shard: testmon selects impacted tests from it and pytest-split takes the shard's slice. Without this fallback every shard starts cold until the first post-merge main push records group-keyed baselines. Signed-off-by: phernandez <paul@basicmachines.co>
A change-detection job gates every test job on code paths (src, tests, test-int, alembic, pyproject, uv.lock, justfile, the workflow itself). Docs-only rounds finish in under a minute with all jobs skipped, while the workflow still concludes successfully so the BM Bossbot gate keeps firing and the PR stays mergeable. Signed-off-by: phernandez <paul@basicmachines.co>
💡 Codex Reviewbasic-memory/.github/workflows/test.yml Line 45 in 3ba50e5 When a PR branch only changes an executable CI helper such as ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
The Tests workflow only triggers on push; PR-branch rounds are push events, so the pull_request conditional never fired and selection was dead on arrival. Branch pushes now select; main pushes record. The paths-filter gets an explicit main base for branch pushes. Signed-off-by: phernandez <paul@basicmachines.co>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
The LLM review gate burned API tokens, failed unrecoverably during the GitHub auth outage, and ended up deadlocking its own replacement PR. The workflow is disabled and its required check removed from the main ruleset; this deletes the workflow, the status/infographic scripts, and the review prompt/schema. Merge discipline (green tests + zero unresolved review threads) is enforced by the merge tooling. Signed-off-by: phernandez <paul@basicmachines.co>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
Summary
The testmon cache has existed in CI since #928 — branch-scoped keys with fallback to main's baseline, wired into all five job families — but
BASIC_MEMORY_TESTMON_FLAGSwas pinned to--testmon-noselect, so every PR build recorded testmon data and still ran the full suite. The cache never bought any wall-clock.This flips PR builds to
--testmon --testmon-forceselect(impacted tests only, selected against the restored baseline), matching basic-memory-cloud's CI policy. Pushes to main keep--testmon-noselect, running the full suite and refreshing the baseline PR builds select from.Expected effect: a repeat build of a PR re-runs only tests impacted by the new commits; even a PR's first build selects against main's baseline. Combined with #938 this should take typical PR rounds from ~15 min to a few minutes.
Trade-off (deliberate, same as cloud): required PR checks no longer execute the full matrix every push — main pushes still do, so regressions testmon misses surface on the merge commit.
Test plan
🤖 Generated with Claude Code
Reviewed SHA:
unknownVerdict:
invalidStatus:
failure- BM Bossbot review output was invalidSummary:
No summary provided.
Blocking findings:
Non-blocking findings: