Skip to content

perf(ci): select impacted tests via testmon on PR builds#945

Merged
phernandez merged 13 commits into
mainfrom
perf/ci-testmon-select-on-prs
Jun 10, 2026
Merged

perf(ci): select impacted tests via testmon on PR builds#945
phernandez merged 13 commits into
mainfrom
perf/ci-testmon-select-on-prs

Conversation

@phernandez

@phernandez phernandez commented Jun 10, 2026

Copy link
Copy Markdown
Member

Summary

The testmon cache has existed in CI since #928 — branch-scoped keys with fallback to main's baseline, wired into all five job families — but BASIC_MEMORY_TESTMON_FLAGS was pinned to --testmon-noselect, so every PR build recorded testmon data and still ran the full suite. The cache never bought any wall-clock.

This flips PR builds to --testmon --testmon-forceselect (impacted tests only, selected against the restored baseline), matching basic-memory-cloud's CI policy. Pushes to main keep --testmon-noselect, running the full suite and refreshing the baseline PR builds select from.

Expected effect: a repeat build of a PR re-runs only tests impacted by the new commits; even a PR's first build selects against main's baseline. Combined with #938 this should take typical PR rounds from ~15 min to a few minutes.

Trade-off (deliberate, same as cloud): required PR checks no longer execute the full matrix every push — main pushes still do, so regressions testmon misses surface on the merge commit.

Test plan

  • YAML validated; flag values mirror the justfile's TESTMON_SELECT/REFRESH defaults and cloud's test.yml
  • CI on this PR is itself the live test (first selective run)

🤖 Generated with Claude Code

Reviewed SHA: unknown
Verdict: invalid
Status: failure - BM Bossbot review output was invalid

Summary:
No summary provided.

Blocking findings:

  • None

Non-blocking findings:

  • None

The testmon cache (branch-scoped, falling back to main's baseline) has
been in place since #928, but the flags pinned --testmon-noselect, so
every PR build recorded data and still ran the full suite. Flip PR
builds to --testmon --testmon-forceselect like basic-memory-cloud;
pushes to main keep --testmon-noselect to refresh the baseline.

Signed-off-by: phernandez <paul@basicmachines.co>
tests/scripts and tests/ci exercise bossbot scripts and workflow
guards — pure CI tooling. They were running on all unit matrix legs
(3 Pythons x 2 backends x 3 OSes). Move them to a single
test-ci-tooling run inside the Static Checks job.

Signed-off-by: phernandez <paul@basicmachines.co>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 99ea2514dd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread .github/workflows/test.yml Outdated
CI tooling doesn't need product-suite tests burning time on every
matrix leg. The bossbot status script is exercised by every PR run.

Signed-off-by: phernandez <paul@basicmachines.co>
Unit legs intermittently hang mid-suite (FastMCP/asyncpg cleanup-hang
family) and sit until the runner gives up, eating 20+ minutes per
occurrence. pytest-timeout turns a hang into a fast failure with a
stack dump naming the test.

Signed-off-by: phernandez <paul@basicmachines.co>
The Postgres unit suite is the CI long pole. pytest-split divides the
collection across a group matrix axis (3 shards x 3 Pythons), each
shard a full job with its own Postgres service. Exit code 5 is treated
as success in the recipe because a testmon-selected PR build can leave
a shard empty. Testmon cache keys gain the shard group.

Signed-off-by: phernandez <paul@basicmachines.co>
SQLite jobs carry the Python-version matrix; Postgres jobs carry
backend coverage on 3.14 only. Postgres unit: 3 shards x 1 Python
instead of 9 jobs; Postgres integration: 1 job instead of 3.

Signed-off-by: phernandez <paul@basicmachines.co>
GitHub-hosted runners are free for public repos; Depot bills per
minute. With testmon-selected PR builds, sharded Postgres units, and
the semantic-search fixture fix, Depot's speed premium no longer
justifies the spend.

Signed-off-by: phernandez <paul@basicmachines.co>
…aseline

A full-run .testmondata is a valid superset baseline for any shard:
testmon selects impacted tests from it and pytest-split takes the
shard's slice. Without this fallback every shard starts cold until the
first post-merge main push records group-keyed baselines.

Signed-off-by: phernandez <paul@basicmachines.co>
A change-detection job gates every test job on code paths (src, tests,
test-int, alembic, pyproject, uv.lock, justfile, the workflow itself).
Docs-only rounds finish in under a minute with all jobs skipped, while
the workflow still concludes successfully so the BM Bossbot gate keeps
firing and the PR stays mergeable.

Signed-off-by: phernandez <paul@basicmachines.co>
@chatgpt-codex-connector

Copy link
Copy Markdown

💡 Codex Review

- '.github/workflows/test.yml'

P2 Badge Include CI script changes in the test filter

When a PR branch only changes an executable CI helper such as scripts/testmon_cache.py, scripts/bm_bossbot_status.py, or scripts/generate_pr_infographic.py, this new paths filter leaves needs.changes.outputs.code false, so every static/test job is skipped even though the Tests workflow still succeeds and triggers the BM Bossbot workflow_run gate. These scripts are invoked by the justfile and BM Bossbot workflow, so script-only regressions can now merge without the Python checks that previously covered them; add the relevant scripts/** entries to this filter or route those scripts through another required check.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

phernandez and others added 2 commits June 10, 2026 12:08
The Tests workflow only triggers on push; PR-branch rounds are push
events, so the pull_request conditional never fired and selection was
dead on arrival. Branch pushes now select; main pushes record. The
paths-filter gets an explicit main base for branch pushes.

Signed-off-by: phernandez <paul@basicmachines.co>
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

The LLM review gate burned API tokens, failed unrecoverably during the
GitHub auth outage, and ended up deadlocking its own replacement PR.
The workflow is disabled and its required check removed from the main
ruleset; this deletes the workflow, the status/infographic scripts, and
the review prompt/schema. Merge discipline (green tests + zero
unresolved review threads) is enforced by the merge tooling.

Signed-off-by: phernandez <paul@basicmachines.co>
The batch-indexing race has now flaked three CI rounds today. Skipped
under CI only (still runs locally); #940 tracks the root cause.

Signed-off-by: phernandez <paul@basicmachines.co>
(cherry picked from commit 513fef7)
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@phernandez phernandez merged commit 2f7ef13 into main Jun 10, 2026
27 checks passed
@phernandez phernandez deleted the perf/ci-testmon-select-on-prs branch June 10, 2026 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant