Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -315,11 +315,22 @@ To be released.
that keep `triggerSinks` allowlisting enabled. This change is published
as benchmark scenario schema version 2. [[#744], [#785], [#801], [#802]]

- Added `fedify bench compare` for CI-friendly performance regression gates.
The command checks out base and head refs into temporary worktrees, starts
the benchmark target for each ref, runs the same suite, and fails when the
head regresses beyond `--max-regression` plus the measured per-run noise
band. Benchmark scenarios now run three times by default and aggregate
repeated runs with median latency/throughput and pessimistic correctness
results. This change is published as benchmark report schema version 3
and comparison report schema version 1. [[#744], [#786], [#804]]

[#783]: https://github.com/fedify-dev/fedify/issues/783
[#784]: https://github.com/fedify-dev/fedify/issues/784
[#785]: https://github.com/fedify-dev/fedify/issues/785
[#786]: https://github.com/fedify-dev/fedify/issues/786
[#801]: https://github.com/fedify-dev/fedify/pull/801
[#802]: https://github.com/fedify-dev/fedify/pull/802
[#804]: https://github.com/fedify-dev/fedify/pull/804

### @fedify/fixture

Expand Down
99 changes: 97 additions & 2 deletions docs/manual/benchmarking.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,6 @@ crypto cost is real.
> types, a few options the format accepts are also not implemented yet and are
> rejected up front with a clear message:
>
> - `runs` greater than `1` (repeated runs).
> - An `inbox` `activity` that is not a `Create` carrying an embedded `Note`;
> that is, a non-`Create` `type`, a non-`Note` `object.type`, or
> `embedObject: false`.
Expand Down Expand Up @@ -262,6 +261,29 @@ Signing is kept off the send critical path, set per scenario with `signing`:
(open-loop only; Poisson arrivals may still sign a few extra during the
run).

### Repeated runs

Each scenario runs three times by default. Set `runs` in `defaults` to change
the whole suite, or set `runs` on one scenario to override the default for that
scenario:

~~~~ yaml
defaults:
runs: 5
scenarios:
- name: ci-smoke
type: webfinger
runs: 1
recipient: acct:alice@localhost
~~~~

Repeated runs are aggregated for stable CI gates. Latency and throughput
metrics use the median run, request totals and error buckets are summed, queue
depth uses the worst observed maximum, and `successRate` uses the worst run so
one bad run is not hidden by clean neighbors. The JSON report records
`runCount` for every scenario and includes per-run measurements in `runs` when
the scenario ran more than once.

### Output

Choose the format with `--format text` (default), `json`, or `markdown`;
Expand All @@ -288,7 +310,80 @@ CI check. Keep CI gates on robust signals such as success rate, error counts,
and gross throughput or latency floors; precise latency-percentile regression
belongs in a controlled environment, not a shared CI runner.

[report schema]: https://json-schema.fedify.dev/bench/report-v2.json
[report schema]: https://json-schema.fedify.dev/bench/report-v3.json

### Comparing two revisions

Use `fedify bench compare` when a CI job should compare a change against a base
revision on the same runner instead of relying on an absolute threshold:

~~~~ sh
fedify bench compare \
--base origin/main \
--head HEAD \
--file scenario.yaml \
--start-command "pnpm dev" \
--ready-url http://127.0.0.1:3000/health \
--max-regression 15%
~~~~

The command creates temporary detached worktrees for the base and head refs,
starts the target command inside each worktree, waits for `--ready-url`, then
runs the same suite from the current checkout against that target. The two
targets run sequentially, so they can use the same port. Dependencies are not
installed automatically; either prepare both refs in the job before comparing
or make `--start-command` perform the needed build/start steps.

If `--target` is omitted, the benchmark target defaults to the origin of
`--ready-url`. Pass `--target` when readiness and benchmark traffic use
different URLs. The comparison report can be written as text, JSON, or
Markdown with the same `--format` and `--output` options; JSON validates
against the [comparison report schema].

`--max-regression` accepts either a ratio such as `0.15` or a percentage such
as `15%`. For each scenario, `fedify bench compare` compares performance
metrics from the scenario's `expect` block when they are latency or rate
metrics; if no such metric is present, it compares `latency.p95` and
`throughputPerSec`. A head result passes when the measured regression is
within `--max-regression` plus the observed per-run noise band. The command
exits with status 1 when the head run fails its own `expect` gate or a
comparison exceeds that allowance; configuration and orchestration failures
exit with status 2.

Use short, broad suites in shared CI:

~~~~ yaml
defaults:
runs: 3
duration: 20s
warmup: 5s
scenarios:
- name: inbox-ci
type: inbox
# ...
expect:
successRate: ">= 99%"
latency.p95: "< 500ms"
~~~~

Use a controlled performance runner for narrower regression checks:

~~~~ yaml
defaults:
runs: 7
duration: 2m
warmup: 20s
scenarios:
- name: inbox-lab
type: inbox
# ...
expect:
successRate: ">= 99.9%"
latency.p95: "< 120ms"
throughputPerSec: "> 250/s"
~~~~

[comparison report schema]: https://json-schema.fedify.dev/bench/compare-report-v1.json

### Safety

Expand Down
83 changes: 83 additions & 0 deletions packages/cli/src/bench/__fixtures__/compare-reports/basic.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
{
"$schema": "https://json-schema.fedify.dev/bench/compare-report-v1.json",
"schemaVersion": 1,
"tool": { "name": "@fedify/cli", "version": "2.3.0" },
"environment": {
"runtime": "deno",
"runtimeVersion": "2.5.0",
"os": "linux",
"cpuCount": 16
},
"startedAt": "2026-06-04T12:00:00.000Z",
"finishedAt": "2026-06-04T12:03:00.000Z",
"suite": { "name": "Inbox regression suite", "configHash": "sha256:abc123" },
"maxRegression": 0.15,
"base": {
"ref": "origin/main",
"report": {
"$schema": "https://json-schema.fedify.dev/bench/report-v3.json",
"schemaVersion": 3,
"tool": { "name": "@fedify/cli", "version": "2.3.0" },
"environment": {
"runtime": "deno",
"runtimeVersion": "2.5.0",
"os": "linux",
"cpuCount": 16
},
"target": {
"url": "http://localhost:3000",
"fedifyVersion": "2.3.0",
"statsAvailable": true
},
"startedAt": "2026-06-04T12:00:00.000Z",
"finishedAt": "2026-06-04T12:01:00.000Z",
"suite": {
"name": "Inbox regression suite",
"configHash": "sha256:abc123"
},
"passed": true,
"scenarios": []
}
},
"head": {
"ref": "HEAD",
"report": {
"$schema": "https://json-schema.fedify.dev/bench/report-v3.json",
"schemaVersion": 3,
"tool": { "name": "@fedify/cli", "version": "2.3.0" },
"environment": {
"runtime": "deno",
"runtimeVersion": "2.5.0",
"os": "linux",
"cpuCount": 16
},
"target": {
"url": "http://localhost:3000",
"fedifyVersion": "2.3.0",
"statsAvailable": true
},
"startedAt": "2026-06-04T12:02:00.000Z",
"finishedAt": "2026-06-04T12:03:00.000Z",
"suite": {
"name": "Inbox regression suite",
"configHash": "sha256:abc123"
},
"passed": true,
"scenarios": []
}
},
"comparisons": [
{
"scenario": "inbox-shared",
"metric": "latency.p95",
"direction": "lower-is-better",
"base": 91,
"head": 94,
"regression": 0.03296703296703297,
"noiseBand": 0.02,
"allowedRegression": 0.16999999999999998,
"pass": true
}
],
"passed": true
}
7 changes: 4 additions & 3 deletions packages/cli/src/bench/__fixtures__/reports/inbox-report.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"$schema": "https://json-schema.fedify.dev/bench/report-v2.json",
"schemaVersion": 2,
"$schema": "https://json-schema.fedify.dev/bench/report-v3.json",
"schemaVersion": 3,
"tool": { "name": "@fedify/cli", "version": "2.3.0" },
"environment": {
"runtime": "deno",
Expand Down Expand Up @@ -86,7 +86,8 @@
"pass": true
}
],
"passed": true
"passed": true,
"runCount": 1
}
]
}
Loading
Loading