test(mem-repro): passthrough memory-leak reproduction harness#396
test(mem-repro): passthrough memory-leak reproduction harness#396coderdan wants to merge 1 commit into
Conversation
Tooling used to reproduce and root-cause the passthrough memory leak (BUG-300, fixed in #395): a Go load generator (loadgen/), a sustained-load soak runner with cgroup RSS sampling (soak.sh), single-run and host-process variants, a plaintext schema, and a probe for the named-prepared-statement path (prepleak/). summary.md documents the trigger (empty encrypt config), root cause, and verification. Self-contained under scripts/mem-repro/; not part of the Rust build. Raw run outputs (results/) are git-ignored. The jemalloc allocator experiment from the investigation is intentionally not included. Relates to BUG-300.
📝 WalkthroughWalkthroughThis PR adds a complete memory reproduction harness for BUG-300, a passthrough memory leak occurring with empty encryption configs on unconfigured proxy deployments. The changes include comprehensive incident documentation, a SQL schema for test workloads, Go-based load generators, memory sampling utilities, and bash orchestration scripts for both host-based and containerized test execution. ChangesMemory Reproduction Harness for BUG-300
Sequence Diagram(s)sequenceDiagram
participant Operator
participant Orchestrator as Test Orchestrator
participant Sampler as Memory Sampler
participant LoadGen as Load Generator
participant Proxy
participant Database
Operator->>Orchestrator: run.sh or soak.sh
Orchestrator->>Sampler: Start background RSS sampling
Orchestrator->>Orchestrator: Capture baseline memory
Orchestrator->>LoadGen: Launch load generator
LoadGen->>Proxy: Insert operations (many transactions)
Proxy->>Database: Execute INSERTs via passthrough
Sampler->>Proxy: Poll RSS continuously
Database->>Proxy: Acknowledge writes
Proxy->>LoadGen: Transaction commits
Orchestrator->>Orchestrator: Wait cooldown period
Orchestrator->>Sampler: Stop sampling
Orchestrator->>Orchestrator: Compute peak/final RSS from CSV
Orchestrator->>Operator: Report memory metrics and leak detection
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.12.2)level=error msg="[linters_context] typechecking error: pattern ./...: directory prefix . does not contain main module or its selected dependencies" Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Adds a self-contained reproduction harness under scripts/mem-repro/ to reliably surface, measure, and document the passthrough memory leak (BUG-300 / follow-up to #395), including Docker-based runners, RSS sampling, a Go load generator, and a written investigation summary.
Changes:
- Added shell scripts to run single-shot and soak-style repros while sampling RSS (host + Docker/cgroup).
- Added Go-based load generators (
loadgen/,prepleak/) plus Go module metadata for reproducible runs. - Added schema + write-up docs (
schema.sql,summary.md,README.md) and ignoredresults/outputs.
Reviewed changes
Copilot reviewed 11 out of 12 changed files in this pull request and generated 20 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/mem-repro/README.md | Documents harness purpose, layout, prerequisites, and how to reproduce/compare variants |
| scripts/mem-repro/summary.md | Detailed incident summary, root cause explanation, and reproduction table/results |
| scripts/mem-repro/soak.sh | Sustained-load Docker soak runner with cgroup memory sampling and CSV output |
| scripts/mem-repro/run-docker.sh | Single-run Docker-based runner + cgroup memory sampling |
| scripts/mem-repro/run.sh | Single-run host-process runner + RSS sampling via ps |
| scripts/mem-repro/sample-rss.sh | Standalone RSS/VSZ sampler that appends to CSV |
| scripts/mem-repro/schema.sql | Plaintext schema to keep traffic passthrough for the repro |
| scripts/mem-repro/loadgen/main.go | Concurrent insert load generator using pgxpool |
| scripts/mem-repro/prepleak/main.go | Probe for named-prepared-statement retention path |
| scripts/mem-repro/go.mod | Go module for loadgen/ and prepleak/ |
| scripts/mem-repro/go.sum | Go dependency lockfile for the harness |
| scripts/mem-repro/.gitignore | Ignores generated results/ output directory |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ROUNDS="${2:-6}" | ||
| COUNT="${3:-50000}" | ||
| WORKERS="${4:-16}" | ||
| NET="$(docker inspect proxy --format '{{range $k,$v := .NetworkSettings.Networks}}{{$k}}{{end}}' | head -n1)" |
| DSN="postgres://${PGUSER}:${PGPASS}@proxy:6432/${PGDB}?sslmode=disable" | ||
|
|
||
| # Resolve the docker network the proxy is attached to. | ||
| NET="$(docker inspect proxy --format '{{range $k,$v := .NetworkSettings.Networks}}{{$k}}{{end}}' | head -n1)" |
| docker run --rm --network "${NET}" -v "${HERE}:/src" -w /src -e GOFLAGS=-mod=mod golang:1.22-bookworm \ | ||
| bash -c "go mod tidy >/dev/null 2>&1; go run ./loadgen -count ${COUNT} -workers ${WORKERS} -dsn '${DSN}'" >/dev/null 2>&1 || true |
| docker run --rm --network "${NET}" \ | ||
| -v "${HERE}:/src" -w /src \ | ||
| -e GOFLAGS=-mod=mod \ | ||
| golang:1.22-bookworm \ | ||
| bash -c "go mod tidy >/dev/null 2>&1; go run ./loadgen -count ${COUNT} -workers ${WORKERS} -dsn '${DSN}'" |
| if err := tx.Commit(ctx); err != nil { | ||
| log.Printf("Insert failed to commit: %v", err) | ||
| atomic.AddInt64(&completed, 1) | ||
| continue | ||
| } |
| HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" | ||
| REPO_ROOT="$(cd "${HERE}/../.." && pwd)" | ||
| LABEL="${1:?usage: soak.sh <label> [rounds] [count] [workers]}" | ||
| ROUNDS="${2:-6}" |
| HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" | ||
| REPO_ROOT="$(cd "${HERE}/../.." && pwd)" | ||
|
|
||
| LABEL="${1:?usage: run-docker.sh <label> [count] [workers]}" | ||
| COUNT="${2:-20000}" |
| HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" | ||
| REPO_ROOT="$(cd "${HERE}/../.." && pwd)" | ||
|
|
||
| LABEL="${1:?usage: run.sh <label> [count] [workers] [dsn]}" | ||
| COUNT="${2:-20000}" |
| Harness: `scripts/mem-repro/` (`soak.sh`). Allocator is irrelevant (glibc == | ||
| jemalloc); the leak is logical retention, not fragmentation. |
| # Runs several back-to-back rounds of simple-protocol inserts (max per-statement | ||
| # churn) and keeps sampling through a cooldown to see whether RSS is released. |
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (1)
scripts/mem-repro/summary.md (1)
44-58: ⚡ Quick winUse symbols or permalinks instead of live source line numbers.
These
backend.rs:175/frontend.rs:380references will drift as the code changes, which makes the root-cause section hard to verify later. Symbol names alone, or commit-permalinks tied to the investigation snapshot, will age much better.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/mem-repro/summary.md` around lines 44 - 58, Replace fragile numeric file:line references in the root-cause section with stable symbols or a commit permalink: reference the backend symbol(s) (e.g., the function that returns early such as backend::execute or backend::handle_statement and the methods complete_execution()/finish_session()), the frontend symbols (e.g., frontend::start_session, frontend::set_execute*), and context::is_passthrough / encrypt_config.is_empty / mapping_disabled, or add a commit-permalink for the investigation snapshot; update the three places that currently say "backend.rs:175", "frontend.rs:380", and "context/mod.rs:780" to use those symbol names or permalinks so the explanation remains verifiable as code moves.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@scripts/mem-repro/loadgen/main.go`:
- Around line 155-188: The worker currently treats Begin/Exec/Commit failures as
"completed" (atomic.AddInt64(&completed, 1)) and continues, causing main() to
finish with exit 0 even on DB errors; change the worker to propagate the first
error back to main by sending the error (with context cancel or a buffered errCh
and sync.Once) whenever pool.Begin(ctx), tx.Exec(ctx, ...), or tx.Commit(ctx)
fail (and still rollback where appropriate), stop incrementing completed for
failed attempts so completed only counts successful commits, and have main()
listen for that error and return/exit non-zero on the first reported error.
- Around line 17-25: The flags validation is incomplete: after flag.Parse() add
checks that *count and *workers are positive (>0) and fail fast; specifically,
validate the variables count and workers (and keep the existing dsn check) and
call log.Fatal with clear messages like "count must be > 0" or "workers must be
> 0" when invalid so main (and the insert loop that uses *count and worker
goroutines that rely on *workers) never proceeds with negative or zero values.
In `@scripts/mem-repro/run-docker.sh`:
- Around line 29-30: The NET assignment concatenates all network names with no
separator, producing an invalid --network when proxy is attached to multiple
networks; change the docker inspect template so each network key is emitted on
its own line and then keep the head -n1 selection. Concretely, update the NET
assignment (the variable NET and its docker inspect invocation) to use a newline
in the template, e.g. '{{range $k,$v :=
.NetworkSettings.Networks}}{{$k}}{{"\n"}}{{end}}' | head -n1, so NET contains
exactly one valid network name.
In `@scripts/mem-repro/sample-rss.sh`:
- Around line 24-29: The script currently resolves PID by taking the first pgrep
match into PID, which is unsafe when multiple processes match PATTERN; change
the logic around the PID resolution (the PID variable and the pgrep -f
"$PATTERN" invocation) to require exactly one match: run pgrep -f "$PATTERN",
count matches, and if count == 0 exit with a clear error, if count > 1 exit with
a clear error asking the user to provide an explicit PID or tighten PATTERN (do
not silently pick head -n1). Ensure error messages reference PATTERN and the
ambiguous match count so callers know why it failed.
In `@scripts/mem-repro/schema.sql`:
- Around line 10-19: The schema must be deterministic: replace the current
CREATE TABLE IF NOT EXISTS for credit_data_order_v2 with a deterministic setup
(e.g., run DROP TABLE IF EXISTS credit_data_order_v2; then CREATE TABLE ... with
the exact columns and defaults) or switch to a dedicated harness table name and
ensure the harness that expects the plaintext passthrough table (the loadgen
main harness) targets that name; alternatively add an explicit schema
verification step that inspects credit_data_order_v2 columns/types and fails
fast on drift so the loadgen harness cannot silently run against an incompatible
shape.
In `@scripts/mem-repro/soak.sh`:
- Line 17: The NET assignment in soak.sh uses a docker inspect template that
concatenates all network names into one token; change the template passed to
docker inspect in the NET assignment (the command starting with docker inspect
proxy --format ...) to emit one network name per line (e.g. add a newline inside
the range template) so that piping to head -n1 returns a single valid network;
update the NET variable assignment accordingly (same fix as applied to
run-docker.sh) so the template prints each .NetworkSettings.Networks key on its
own line before head -n1.
- Around line 29-32: The loop currently swallows failures because the docker run
... go run ./loadgen command ends with "|| true"; remove that "|| true" and
instead check the command's exit status (the docker run invocation inside the
for r in $(seq 1 "${ROUNDS}") loop) immediately after it; if the exit code is
non‑zero, echo an error mentioning the LABEL and round number and exit non‑zero
(or break) so failed load rounds are not treated as successful before printing
the RSS summary.
In `@scripts/mem-repro/summary.md`:
- Around line 11-13: The fenced code block containing the environment variable
line "CS_DEVELOPMENT__DISABLE_MAPPING=true" needs a language tag; change the
opening triple backticks from ``` to ```bash (or ```sh) so the snippet is marked
as shell and markdownlint stops flagging it, e.g., update the fence around that
exact snippet in the summary.md file.
---
Nitpick comments:
In `@scripts/mem-repro/summary.md`:
- Around line 44-58: Replace fragile numeric file:line references in the
root-cause section with stable symbols or a commit permalink: reference the
backend symbol(s) (e.g., the function that returns early such as
backend::execute or backend::handle_statement and the methods
complete_execution()/finish_session()), the frontend symbols (e.g.,
frontend::start_session, frontend::set_execute*), and context::is_passthrough /
encrypt_config.is_empty / mapping_disabled, or add a commit-permalink for the
investigation snapshot; update the three places that currently say
"backend.rs:175", "frontend.rs:380", and "context/mod.rs:780" to use those
symbol names or permalinks so the explanation remains verifiable as code moves.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e1af32e3-7568-4047-9c4a-7f816f99d5dd
⛔ Files ignored due to path filters (1)
scripts/mem-repro/go.sumis excluded by!**/*.sum
📒 Files selected for processing (11)
scripts/mem-repro/.gitignorescripts/mem-repro/README.mdscripts/mem-repro/go.modscripts/mem-repro/loadgen/main.goscripts/mem-repro/prepleak/main.goscripts/mem-repro/run-docker.shscripts/mem-repro/run.shscripts/mem-repro/sample-rss.shscripts/mem-repro/schema.sqlscripts/mem-repro/soak.shscripts/mem-repro/summary.md
| var ( | ||
| dsn = flag.String("dsn", "postgres://bloomuser:password@127.0.0.1:6432/pdts_db?sslmode=disable", "Database connection string (DSN)") | ||
| count = flag.Int("count", 10000, "Number of inserts to perform") | ||
| workers = flag.Int("workers", 10, "Number of concurrent workers") | ||
| ) | ||
| flag.Parse() | ||
|
|
||
| if *dsn == "" { | ||
| log.Fatal("DSN is required. Use -dsn flag to provide database connection string") |
There was a problem hiding this comment.
Validate -count and -workers as positive values.
Right now Line 143 will panic for a negative -count, and -workers=0 falls through to a successful Done! without inserting anything. This harness should fail fast on invalid inputs instead of producing misleading no-op runs.
Proposed fix
flag.Parse()
if *dsn == "" {
log.Fatal("DSN is required. Use -dsn flag to provide database connection string")
}
+ if *count <= 0 {
+ log.Fatal("-count must be > 0")
+ }
+ if *workers <= 0 {
+ log.Fatal("-workers must be > 0")
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| var ( | |
| dsn = flag.String("dsn", "postgres://bloomuser:password@127.0.0.1:6432/pdts_db?sslmode=disable", "Database connection string (DSN)") | |
| count = flag.Int("count", 10000, "Number of inserts to perform") | |
| workers = flag.Int("workers", 10, "Number of concurrent workers") | |
| ) | |
| flag.Parse() | |
| if *dsn == "" { | |
| log.Fatal("DSN is required. Use -dsn flag to provide database connection string") | |
| var ( | |
| dsn = flag.String("dsn", "postgres://bloomuser:password@127.0.0.1:6432/pdts_db?sslmode=disable", "Database connection string (DSN)") | |
| count = flag.Int("count", 10000, "Number of inserts to perform") | |
| workers = flag.Int("workers", 10, "Number of concurrent workers") | |
| ) | |
| flag.Parse() | |
| if *dsn == "" { | |
| log.Fatal("DSN is required. Use -dsn flag to provide database connection string") | |
| } | |
| if *count <= 0 { | |
| log.Fatal("-count must be > 0") | |
| } | |
| if *workers <= 0 { | |
| log.Fatal("-workers must be > 0") | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/mem-repro/loadgen/main.go` around lines 17 - 25, The flags validation
is incomplete: after flag.Parse() add checks that *count and *workers are
positive (>0) and fail fast; specifically, validate the variables count and
workers (and keep the existing dsn check) and call log.Fatal with clear messages
like "count must be > 0" or "workers must be > 0" when invalid so main (and the
insert loop that uses *count and worker goroutines that rely on *workers) never
proceeds with negative or zero values.
| tx, err := pool.Begin(ctx) | ||
| if err != nil { | ||
| log.Printf("Insert failed to begin tx: %v", err) | ||
| atomic.AddInt64(&completed, 1) | ||
| continue | ||
| } | ||
|
|
||
| // NOTE: The original repro issued `SET CIPHERSTASH.KEYSET_NAME` | ||
| // here to mirror repo.setKeysetNameOnTx. For the passthrough | ||
| // (no-encryption) memory test that SET is omitted: this proxy has | ||
| // a default keyset configured, which rejects the SET, and it is | ||
| // irrelevant to the per-statement type-check churn we measure. | ||
|
|
||
| _, err = tx.Exec(ctx, query, | ||
| id, | ||
| orgID, | ||
| orderID, | ||
| false, | ||
| testJSON, | ||
| testJSON, | ||
| now, | ||
| now, | ||
| ) | ||
| if err != nil { | ||
| tx.Rollback(ctx) | ||
| log.Printf("Insert failed: %v", err) | ||
| atomic.AddInt64(&completed, 1) | ||
| continue | ||
| } | ||
|
|
||
| if err := tx.Commit(ctx); err != nil { | ||
| log.Printf("Insert failed to commit: %v", err) | ||
| atomic.AddInt64(&completed, 1) | ||
| continue |
There was a problem hiding this comment.
Fail the run when a worker hits a database error.
Lines 156-188 currently count begin/exec/commit failures as completed work and still let main() exit 0 with Done!. That makes a broken repro look successful and can invalidate any RSS series collected around it. Please propagate the first worker error back to main and return a non-zero exit instead of continuing.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/mem-repro/loadgen/main.go` around lines 155 - 188, The worker
currently treats Begin/Exec/Commit failures as "completed"
(atomic.AddInt64(&completed, 1)) and continues, causing main() to finish with
exit 0 even on DB errors; change the worker to propagate the first error back to
main by sending the error (with context cancel or a buffered errCh and
sync.Once) whenever pool.Begin(ctx), tx.Exec(ctx, ...), or tx.Commit(ctx) fail
(and still rollback where appropriate), stop incrementing completed for failed
attempts so completed only counts successful commits, and have main() listen for
that error and return/exit non-zero on the first reported error.
| # Resolve the docker network the proxy is attached to. | ||
| NET="$(docker inspect proxy --format '{{range $k,$v := .NetworkSettings.Networks}}{{$k}}{{end}}' | head -n1)" |
There was a problem hiding this comment.
Pick one Docker network name instead of concatenating all of them.
Line 30 renders every attached network key with no separator, so a multi-network proxy container produces an invalid --network value. head -n1 only works if the template prints one network per line.
Proposed fix
-NET="$(docker inspect proxy --format '{{range $k,$v := .NetworkSettings.Networks}}{{$k}}{{end}}' | head -n1)"
+NET="$(docker inspect proxy --format '{{range $k,$v := .NetworkSettings.Networks}}{{println $k}}{{end}}' | head -n1)"
+if [[ -z "${NET}" ]]; then
+ echo "Could not determine the Docker network for container 'proxy'." >&2
+ exit 1
+fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Resolve the docker network the proxy is attached to. | |
| NET="$(docker inspect proxy --format '{{range $k,$v := .NetworkSettings.Networks}}{{$k}}{{end}}' | head -n1)" | |
| # Resolve the docker network the proxy is attached to. | |
| NET="$(docker inspect proxy --format '{{range $k,$v := .NetworkSettings.Networks}}{{println $k}}{{end}}' | head -n1)" | |
| if [[ -z "${NET}" ]]; then | |
| echo "Could not determine the Docker network for container 'proxy'." >&2 | |
| exit 1 | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/mem-repro/run-docker.sh` around lines 29 - 30, The NET assignment
concatenates all network names with no separator, producing an invalid --network
when proxy is attached to multiple networks; change the docker inspect template
so each network key is emitted on its own line and then keep the head -n1
selection. Concretely, update the NET assignment (the variable NET and its
docker inspect invocation) to use a newline in the template, e.g. '{{range $k,$v
:= .NetworkSettings.Networks}}{{$k}}{{"\n"}}{{end}}' | head -n1, so NET contains
exactly one valid network name.
| # Resolve the PID once so we follow a single process for the whole run. | ||
| PID="$(pgrep -f "$PATTERN" | head -n1 || true)" | ||
| if [[ -z "${PID}" ]]; then | ||
| echo "No process matching '${PATTERN}' is running. Start the proxy first." >&2 | ||
| exit 1 | ||
| fi |
There was a problem hiding this comment.
Reject ambiguous process matches instead of sampling the first PID.
Lines 25-29 silently pick the first pgrep match. If more than one process matches the pattern, the CSV can end up tracking the wrong process while still looking valid. For a repro harness, it's safer to require exactly one match or ask for an explicit PID.
Proposed fix
-# Resolve the PID once so we follow a single process for the whole run.
-PID="$(pgrep -f "$PATTERN" | head -n1 || true)"
-if [[ -z "${PID}" ]]; then
+# Resolve the PID once so we follow a single process for the whole run.
+mapfile -t PIDS < <(pgrep -f "$PATTERN" || true)
+if (( ${`#PIDS`[@]} == 0 )); then
echo "No process matching '${PATTERN}' is running. Start the proxy first." >&2
exit 1
fi
+if (( ${`#PIDS`[@]} > 1 )); then
+ echo "Pattern '${PATTERN}' matched multiple processes: ${PIDS[*]}" >&2
+ echo "Use a more specific pattern or extend the script to accept an explicit PID." >&2
+ exit 1
+fi
+PID="${PIDS[0]}"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Resolve the PID once so we follow a single process for the whole run. | |
| PID="$(pgrep -f "$PATTERN" | head -n1 || true)" | |
| if [[ -z "${PID}" ]]; then | |
| echo "No process matching '${PATTERN}' is running. Start the proxy first." >&2 | |
| exit 1 | |
| fi | |
| # Resolve the PID once so we follow a single process for the whole run. | |
| mapfile -t PIDS < <(pgrep -f "$PATTERN" || true) | |
| if (( ${`#PIDS`[@]} == 0 )); then | |
| echo "No process matching '${PATTERN}' is running. Start the proxy first." >&2 | |
| exit 1 | |
| fi | |
| if (( ${`#PIDS`[@]} > 1 )); then | |
| echo "Pattern '${PATTERN}' matched multiple processes: ${PIDS[*]}" >&2 | |
| echo "Use a more specific pattern or extend the script to accept an explicit PID." >&2 | |
| exit 1 | |
| fi | |
| PID="${PIDS[0]}" |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/mem-repro/sample-rss.sh` around lines 24 - 29, The script currently
resolves PID by taking the first pgrep match into PID, which is unsafe when
multiple processes match PATTERN; change the logic around the PID resolution
(the PID variable and the pgrep -f "$PATTERN" invocation) to require exactly one
match: run pgrep -f "$PATTERN", count matches, and if count == 0 exit with a
clear error, if count > 1 exit with a clear error asking the user to provide an
explicit PID or tighten PATTERN (do not silently pick head -n1). Ensure error
messages reference PATTERN and the ambiguous match count so callers know why it
failed.
| CREATE TABLE IF NOT EXISTS credit_data_order_v2 ( | ||
| id uuid PRIMARY KEY, | ||
| organization_id uuid NOT NULL, | ||
| order_id uuid NOT NULL, | ||
| account_review boolean NOT NULL DEFAULT false, | ||
| full_report jsonb NOT NULL, | ||
| raw_report jsonb NOT NULL, | ||
| created_at timestamptz NOT NULL DEFAULT now(), | ||
| updated_at timestamptz NOT NULL DEFAULT now() | ||
| ); |
There was a problem hiding this comment.
Make the repro schema deterministic.
CREATE TABLE IF NOT EXISTS will silently reuse any existing credit_data_order_v2, even if it still has a different shape from a prior test setup. That breaks the “self-contained repro harness” goal, because scripts/mem-repro/loadgen/main.go:136-177 will then run against whatever schema happened to be there instead of the plaintext passthrough table this harness expects.
Consider dropping/recreating the table in a scratch DB, using a dedicated harness table name, or adding an explicit schema verification step that fails on drift.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/mem-repro/schema.sql` around lines 10 - 19, The schema must be
deterministic: replace the current CREATE TABLE IF NOT EXISTS for
credit_data_order_v2 with a deterministic setup (e.g., run DROP TABLE IF EXISTS
credit_data_order_v2; then CREATE TABLE ... with the exact columns and defaults)
or switch to a dedicated harness table name and ensure the harness that expects
the plaintext passthrough table (the loadgen main harness) targets that name;
alternatively add an explicit schema verification step that inspects
credit_data_order_v2 columns/types and fails fast on drift so the loadgen
harness cannot silently run against an incompatible shape.
| ROUNDS="${2:-6}" | ||
| COUNT="${3:-50000}" | ||
| WORKERS="${4:-16}" | ||
| NET="$(docker inspect proxy --format '{{range $k,$v := .NetworkSettings.Networks}}{{$k}}{{end}}' | head -n1)" |
There was a problem hiding this comment.
Fix Docker network resolution here as well.
Line 17 has the same concatenation bug as run-docker.sh: if proxy is attached to multiple networks, the template returns a single invalid combined name. Emit one network per line before head -n1.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/mem-repro/soak.sh` at line 17, The NET assignment in soak.sh uses a
docker inspect template that concatenates all network names into one token;
change the template passed to docker inspect in the NET assignment (the command
starting with docker inspect proxy --format ...) to emit one network name per
line (e.g. add a newline inside the range template) so that piping to head -n1
returns a single valid network; update the NET variable assignment accordingly
(same fix as applied to run-docker.sh) so the template prints each
.NetworkSettings.Networks key on its own line before head -n1.
| for r in $(seq 1 "${ROUNDS}"); do | ||
| docker run --rm --network "${NET}" -v "${HERE}:/src" -w /src -e GOFLAGS=-mod=mod golang:1.22-bookworm \ | ||
| bash -c "go mod tidy >/dev/null 2>&1; go run ./loadgen -count ${COUNT} -workers ${WORKERS} -dsn '${DSN}'" >/dev/null 2>&1 || true | ||
| echo "[$LABEL] round ${r}/${ROUNDS} done; RSS=$(docker exec proxy cat /sys/fs/cgroup/memory.current|awk '{printf "%.1f",$1/1048576}') MiB" |
There was a problem hiding this comment.
Do not swallow failed load rounds.
Line 31 ends the docker run ... go run command with || true, so the script prints round ... done even when the load generator fails. That makes the final RSS summary unsafe to interpret because it may describe a run that never actually exercised the proxy.
Proposed fix
- docker run --rm --network "${NET}" -v "${HERE}:/src" -w /src -e GOFLAGS=-mod=mod golang:1.22-bookworm \
- bash -c "go mod tidy >/dev/null 2>&1; go run ./loadgen -count ${COUNT} -workers ${WORKERS} -dsn '${DSN}'" >/dev/null 2>&1 || true
+ docker run --rm --network "${NET}" -v "${HERE}:/src" -w /src -e GOFLAGS=-mod=mod golang:1.22-bookworm \
+ bash -c "go mod tidy >/dev/null 2>&1; go run ./loadgen -count ${COUNT} -workers ${WORKERS} -dsn '${DSN}'" >/dev/null📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for r in $(seq 1 "${ROUNDS}"); do | |
| docker run --rm --network "${NET}" -v "${HERE}:/src" -w /src -e GOFLAGS=-mod=mod golang:1.22-bookworm \ | |
| bash -c "go mod tidy >/dev/null 2>&1; go run ./loadgen -count ${COUNT} -workers ${WORKERS} -dsn '${DSN}'" >/dev/null 2>&1 || true | |
| echo "[$LABEL] round ${r}/${ROUNDS} done; RSS=$(docker exec proxy cat /sys/fs/cgroup/memory.current|awk '{printf "%.1f",$1/1048576}') MiB" | |
| for r in $(seq 1 "${ROUNDS}"); do | |
| docker run --rm --network "${NET}" -v "${HERE}:/src" -w /src -e GOFLAGS=-mod=mod golang:1.22-bookworm \ | |
| bash -c "go mod tidy >/dev/null 2>&1; go run ./loadgen -count ${COUNT} -workers ${WORKERS} -dsn '${DSN}'" >/dev/null | |
| echo "[$LABEL] round ${r}/${ROUNDS} done; RSS=$(docker exec proxy cat /sys/fs/cgroup/memory.current|awk '{printf "%.1f",$1/1048576}') MiB" |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/mem-repro/soak.sh` around lines 29 - 32, The loop currently swallows
failures because the docker run ... go run ./loadgen command ends with "||
true"; remove that "|| true" and instead check the command's exit status (the
docker run invocation inside the for r in $(seq 1 "${ROUNDS}") loop) immediately
after it; if the exit code is non‑zero, echo an error mentioning the LABEL and
round number and exit non‑zero (or break) so failed load rounds are not treated
as successful before printing the RSS summary.
| ``` | ||
| CS_DEVELOPMENT__DISABLE_MAPPING=true | ||
| ``` |
There was a problem hiding this comment.
Add a language tag to this fenced block.
markdownlint is already flagging this snippet; marking it as bash/sh keeps the doc lint-clean.
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 11-11: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/mem-repro/summary.md` around lines 11 - 13, The fenced code block
containing the environment variable line "CS_DEVELOPMENT__DISABLE_MAPPING=true"
needs a language tag; change the opening triple backticks from ``` to ```bash
(or ```sh) so the snippet is marked as shell and markdownlint stops flagging it,
e.g., update the fence around that exact snippet in the summary.md file.
Follow-up to #395 (BUG-300). Adds the harness used to reproduce and root-cause the passthrough memory leak, so the result is reproducible by anyone.
Contents (
scripts/mem-repro/)soak.sh— sustained load + cgroup RSS sampling; the script that surfaces the leak.run-docker.sh/run.sh— single-run variants (container / host process).sample-rss.sh— standalone RSS sampler.loadgen/— Go load generator (parameterised INSERTs via pgx);prepleak/— probe for the named-prepared-statement path.schema.sql— plaintext table so all traffic is passthrough.summary.md— trigger (empty encrypt config), root cause, reproduction table, and the fix (fix(passthrough): drain per-statement queues to stop memory leak (BUG-300) #395).Notes
scripts/mem-repro/; not part of the Rust build. No local Go needed for the dockerised runners.results/) are git-ignored.How it was used
Empty the encrypt config, then
soak.sh. On an affected build RSS climbs monotonically (300k inserts → 317 MiB, no release); with #395 it plateaus (~49 MiB).Summary by CodeRabbit
Bug Fixes
Documentation
Tests