complexity-reduction 3/4: microvm-coverage tooling + self-test checks (M1–M6)#26
Merged
Merged
Conversation
Add a -coverage-out flag to the aggregator that regenerates the
coverage-func.out + coverage-per-package.tsv artifacts inside rawDir
from a supplied Go coverage profile, before ingestCoverage runs.
The update-quality-report wrapper will use this to feed a merged host
+ microvm coverage profile back through the aggregator without
rebuilding the entire Nix derivation.
regenerateCoverageArtifacts(rawDir, profile)
1. go tool cover -func=<profile> → <rawDir>/coverage-func.out
2. buildPerPackageTSV — pure-Go port of the awk in
nix/quality-report/default.nix; dedupes atomic-mode block
repetitions by max-count and aggregates per package directory.
New tests (ratchet_test.go):
TestBuildPerPackageTSV_table — 5 rows covering atomic-dedup,
non-module-line filtering, empty profile, zero-statement blocks
TestRegenerateCoverageArtifacts_writesFiles — verifies both
artifacts land in rawDir (skips if go tool cover unavailable)
TestRegenerateCoverageArtifacts_missingProfileErrs
Extend the shell wrapper with an optional --with-microvm flag:
1. Boot the coverage-instrumented microvm via
`nix run .#microvm-x86_64-lifecycle-coverage` and scrape the
Go coverage data dump from its serial console into a tempdir.
2. Build the regular .#quality-report (host coverage in raw/).
3. Merge VM dir + host coverage.out via .#coverage-merge.
4. Copy the read-only Nix-store raw/ into a writable temp dir, then
re-run the quality-report aggregator with the new -coverage-out
flag pointing at the merged profile.
5. Overwrite docs/quality-report.md with the merged-numbers report.
Without --with-microvm, behaviour is unchanged: a single Nix build,
copy the markdown.
Falls back gracefully:
- If the microvm lifecycle scrapes zero coverage files (KVM
unavailable, VM crashed, etc.) we WARN + fall back to host-only.
- Adds versions.go to runtimeInputs so the aggregator re-run has
Go on PATH (regenerateCoverageArtifacts shells to `go tool
cover -func`).
Microvm coverage merge wired into update-quality-report --with-microvm: cmd/xtcp2 92.4% → 95.9% (daemon runDaemon now exercised in VM) pkg/xtcp 85.2% → 87.1% (netlink/ns paths under real kernel) pkg/xtcpnl 91.4% → 91.8% Overall 90.3% → 91.1% The lifecycle test exited non-zero (one self-test check failed) so only 2 coverage files were scraped. Adding the iouring-flavor VM merge in a follow-up will pick up more io_uring paths.
Extend coverage-merge.nix to accept multiple --vm-dir flags
(concatenated via covdata textfmt's comma-separated -i). Extend
update-quality-report --with-microvm to run BOTH coverage VMs
sequentially:
1. .#microvm-x86_64-lifecycle-coverage (stdlib build) — exercises
the syscall netlinker + namespace/ns_watch paths
2. .#microvm-x86_64-lifecycle-coverage-iouring (iouring build) —
exercises netlinker_iouring + io_uring.Ring paths that the
stdlib VM can't reach (different build tag)
The merge picks up every block covered by either VM, then the
existing host+VM merge in coverage-merge.nix takes the union of
host coverage and the combined VM coverage.
Falls back gracefully:
- If either VM scrapes zero files, that --vm-dir is skipped.
- If BOTH scrape zero, we WARN + fall back to host-only (same
behaviour as before this commit, just lifted to the two-VM
aggregate check).
…elete)
Add two new checks to nix/microvms/self-test.nix that exercise xtcp2's
namespace-watching pipeline end-to-end inside the coverage VM:
Check 8 (NS_LIFECYCLE):
ip netns add xtcp_test_ns_a → fsnotify Create
→ watchNsNamespace dispatchNsFsEvent
→ nsAdd
→ netNamespaceInstance
→ openAndSetNSWithRetries (real
Open + Setns syscalls)
→ syscall.Socket(AF_NETLINK) + Bind
→ createNetlinkersAndStore
→ spawns a per-ns netlinker goroutine
ip netns delete → fsnotify Remove → nsDelete teardown
Check 9 (NS_TRAFFIC):
Same as above, but ALSO creates a TCP listener + client pair
inside the new ns. The per-ns netlinker polls inet_diag in that
ns; the daemon's Netlinker.packets counter bumps. This drives
the full netlinkerSyscall body — Recvfrom on a real netlink fd,
Deserialize on real (not garbage) netlink bytes, every per-
attribute deserializer that finds a present attribute.
Assertions read xtcp_counts metric vector via curl /metrics
(function="watchNamespaces" event-counter; function=
"netNamespaceInstance" start-counter; function="Netlinker"
packets-counter).
Both checks fall back gracefully if iproute2/nc are missing on
PATH; iproute2 is already in self-test runtimeInputs.
nix/microvms/default.nix: extend the coverage + coverage-iouring
lifecycle sentinelRe so the new sentinels surface in the harness
output (default filter hid them).
The metric_value awk filter was matching on task="…" but the actual label key in pkg/xtcp/prometheus.go is variable="…". With the wrong key both before/after queries returned 0, masking whether netNamespaceInstance actually started. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Check 8 NS_LIFECYCLE failed silently (evt:0→0→0); add /run/netns listing, ip-netns-add stderr capture, and per-call metric-row dumps so the next run reveals whether fsnotify saw the create. Check 9 NS_TRAFFIC now matches both Netlinker and NetlinkerIoUring packet counters so it works in both coverage VM flavors. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ls is fine for a single-call diagnostic dump in the self-test where shellcheck's SC2012 (prefer find) adds no value. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The previous filter passed 'function="X",variable="Y"' as a single substring, but Prometheus prints labels in alphabetical order (function, type, variable), so type="..." sits between them and the substring never matched. Counters always returned 0. Switch metric_value to two separate substring args (function + variable) that are both required to be present in a row. Drop the verbose diagnostic dumps now that the root cause is identified. After this fix: XTCP2_SELF_TEST_NS_LIFECYCLE_PASS (evt:0→1→2 inst:1→2) XTCP2_SELF_TEST_NS_TRAFFIC_PASS (Netlinker.packets:20→36) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- tools/quality-report/main.go:381 exec.Command → exec.CommandContext
with a 30s timeout so the cover -func step can be cancelled cleanly.
- tools/quality-report/main.go:414 remove unused blockKey struct
(leftover from an earlier per-block dedupe approach that lives now in
seenStmt/seenMaxCount maps).
- pkg/xtcpnl/xtcpnl_fatalf_test.go:95 drop int64(tv.Usec) — Timeval.Usec
is already int64 on linux/amd64.
- pkg/xtcp/destinations_{kafka,valkey}_test.go gofmt struct-field alignment.
- docs/quality-report.md regenerated with NS_LIFECYCLE + NS_TRAFFIC
passing in both coverage VMs (evt:0→1→2 inst:1→2,
Netlinker.packets:20→36).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The production adapter showed 0% coverage even with the dest_valkey build tag because every other test bypassed it via the newValkeyClientFn factory seam. Drive each adapter method against an unreachable port with a short-deadline context so Publish + Ping surface dial errors and Close is exercised cleanly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
xtcp2's netNsCandidateDirs probes /run/netns/ AND /run/docker/netns/. The coverage VM only pre-created the first, so the daemon's second watchNsNamespace goroutine (for the docker path) never spawned and that whole branch read 0% coverage. Pre-create /run/docker/netns/ via systemd.tmpfiles in coverage VMs and add Check 10: ip-netns-add → mount --bind into /run/docker/netns/ to fire fsnotify Create on the docker dir. Mirrors what docker actually does at the filesystem level when spawning a container — no docker daemon required. Result (stdlib coverage VM): XTCP2_SELF_TEST_NS_DOCKER_PASS (evt:4→5→6 inst:3→4) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds NS_DOCKER_PASS in both coverage VMs (evt:4→5→6 inst:3→4), confirming the /run/docker/netns/ watch path is now exercised end-to-end. The redisClientAdapter Publish/Ping/Close tests also landed — valkey production adapter went from 0% to 100% and dropped off the gaps list. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two latent bugs that had Check 3 + 5 failing in every flavor since the
self-test was first introduced:
1. cmd/xtcp2client default port was 8888, but the daemon listens on
8889 (cmd/xtcp2 grpcPortCst). Every gRPC roundtrip from xtcp2client
to a default daemon was a silent connection refused. Bump the
client's default to 8889 to match, and add an explicit -port flag
so this footgun is at least configurable. Pinned the constant with
a CLAUDE comment about keeping the two in lockstep.
2. Self-test Check 3 grep'd /var/log/xtcp2.jsonl, but xtcp2 has no
file destination type — vmConfig.json's "type":"file" is
aspirational, never wired into RegisterDestination. The file
literally never existed. Rewrote Check 3 as a metric-driven
assertion: poll xtcp_counts{variable="p"} until ANY Netlinker has
parsed at least one inet_diag socket (which is the only end-to-end
signal Check 3 was ever trying to verify).
3. Self-test Check 5 invoked xtcp2client with `-addr host:port` but
the flag set is `-target host` + `-port num` (now exists). Updated.
Result (stdlib coverage VM):
XTCP2_SELF_TEST_NETLINK_PASS (Netlinker parsed 3 sockets via inet_diag)
XTCP2_SELF_TEST_GRPC_ROUNDTRIP_PASS (xtcp2client rc=124, produced output)
XTCP2_SELF_TEST_OVERALL_PASS
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The syscall netlinker increments xtcp_counts{function="Netlinker",
variable="p",type="count"} by the number of inet_diag sockets parsed
per recv (netlinker.go:194), but the io_uring path discarded
Deserialize's first return. Effect: dashboards + the self-test never
saw iouring-flavor inet_diag activity reflected in the parsed-socket
metric — the counter just stayed at 0 even while NetlinkerIoUring.packets
was bumping each cycle.
Capture the count and emit the symmetric counter so iouring runs are
observable on the same dashboards as syscall runs.
Result: iouring coverage VM now hits NETLINK_PASS + OVERALL_PASS.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
After fixing the xtcp2client port mismatch (Check 5), the file-sink mirage in Check 3, and the NetlinkerIoUring missing-p-counter bug, both stdlib and iouring coverage VMs now hit OVERALL_PASS — all 10 self-test checks green: SYSTEMD METRICS NETLINK BINARIES_HELP GRPC_ROUNDTRIP NS_INSPECT NSTEST NS_LIFECYCLE NS_TRAFFIC NS_DOCKER Total coverage 91.9%. pkg/xtcp 89.4%. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…g test - destinations_kafka.go: delete registerProtobufSchemaRestful which was marked lint:ignore U1000 "historical reference; not called". The whole function (35 lines, 13 stmts) was dragging pkg/xtcp's coverage divisor without ever being exercised. The "bytes" import was only used in this dead code; remove that too. - destinations_kafka_test.go: add TestNewKafkaDest_debugLog to cover the 5 log.Println calls inside the `if x.debugLevel > 10` block that newKafkaDest's happy-path test skips. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
SA9003 (empty branch) was flagged inside an `if r := recover(); r != nil`
block in poller_helpers_test.go. The intent is to swallow the panic
deliberately — collapse to `defer func() { _ = recover() }()` so the
recover stays explicit but the empty-branch warning is gone.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Final state after the M5/M6/check-fix arc: Total findings: 0 Total coverage: 92.4% pkg/xtcp: 90.8% (was 89.4% — cleared the below-90pct finding) Coverage VMs: XTCP2_SELF_TEST_OVERALL_PASS (stdlib + iouring) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
complexity-reduction 3/4 — cluster B2: microvm-coverage tooling + self-test checks (commits 43–63). Applied cleanly onto main.
quality-report -coverage-out+ per-package TSV regen,update-quality-report --with-microvmwiring, merge of the iouring VM coverage, and self-test checks that drive ns-lifecycle / TCP-in-ns //run/docker/netnswatch paths so those code paths are actually covered.OVERALL_PASS.NetlinkerIoUringparsed-socket count counter; coverredisClientAdapterPublish/Ping/Close; delete deadregisterProtobufSchemaRestful+ add anewKafkaDestdebug-log test; SA9003 / noctx / unused / unconvert lint cleanup.Lands on
docs: refresh report — 0 findings, 92.4% coverage, all checks green.Testing
go vet ./...+gofmt -l .— clean (go 1.25; no gofmt-forward needed this batch).go test -ldflags=-checklinkname=0 -tags 'dest_kafka dest_nats dest_nsq dest_valkey' ./...— entire suite green.Note: the M-series wiring targets coverage produced by the coverage microVMs; the in-VM runs themselves aren't executed here (KVM/heavy) — the Go + tooling changes are verified by build/vet/test.
🤖 Generated with Claude Code