diff --git a/PRPs/PRP-showcase-workspace-E5-release-gate.md b/PRPs/PRP-showcase-workspace-E5-release-gate.md new file mode 100644 index 00000000..2c28cdf5 --- /dev/null +++ b/PRPs/PRP-showcase-workspace-E5-release-gate.md @@ -0,0 +1,642 @@ +name: "PRP showcase-workspace-E5 — release gate: 8-preset dogfood + workspace-mode dogfood + doc sweep + umbrella close-out" +description: | + Issue #401 (epic E5 of umbrella #389, milestone showcase-workspace). + Release-gate epic: NO production code. Deliverables are (a) an executed + verification — a per-preset dogfood matrix across all 8 ScenarioPreset cards + on /showcase plus a workspace-mode (preservation=keep) dogfood with + list/Load/Replay + tag retrieval, on a fresh-DB stack; (b) a tracked docs + sweep — a Showcase-workspace section in docs/_base/RUNBOOKS.md and a + showcase_workspace aggregate + ubiquitous-language entry in + docs/_base/DOMAIN_MODEL.md; (c) evidence recorded on #401, umbrella #389 + ticked + closed. If any dogfood check fails OUTSIDE the documented + expected-outcome matrix, the gate STOPS and files a fix issue — it never + fixes forward inside this epic. + +--- + +## Goal + +Close umbrella #389 (showcase workspace — preserve, restore, replay) on **proof, +not per-epic merges**. E1 #390, E2 #391, E3 #392, E4 #393 are all CLOSED and +shipped in v0.2.22; nothing has yet verified their *combined* behavior across +all 8 presets, nor the workspace keep-path on `showcase_rich` (E4's manual +dogfood covered `demo_minimal` only), and the deferred RUNBOOKS/DOMAIN_MODEL +documentation never landed. + +1. **8-preset dogfood matrix** — fresh-DB stack, then one `/showcase` run per + `ScenarioPreset` card with **Re-seed first** ticked (+ **Reset database** + where the matrix below requires it). Record green / expected-skip / + expected-fail per the RUNBOOKS entry-28 matrix; any deviation → STOP RULE. +2. **Workspace-mode dogfood** — one `preservation="keep"` run each on + `demo_minimal` and `showcase_rich` (these double as those presets' matrix + rows). Verify: exactly one new `showcase_workspace` row per run, list/detail + endpoints, UI **Load** (config + artifacts re-attach) and **Replay** (new + row, green pipeline, no 409/500), and `GET /scenarios?tags=workspace:` + returns the showcase-saved plans (E3, `showcase_rich` only). +3. **Docs sweep** — add the Showcase-workspace operational section to + `docs/_base/RUNBOOKS.md` and the `showcase_workspace` aggregate + + `workspace` ubiquitous-language entry to `docs/_base/DOMAIN_MODEL.md` + (both currently have ZERO "workspace" mentions — verified 2026-06-12). +4. **Regression coverage (verify only)** — + `tests/test_e2e_demo.py::test_demo_replay_same_config_twice` green in CI + (CI runs the full pytest incl. integration; latest dev run 27427250799 ✅) + and green in a targeted local re-run. +5. **Close-out** — evidence comment on #401; tick ALL satisfied checkboxes on + #389 (live body has 11/11 unticked — drift) and fix the E5 line + ("not yet created" → "#401"); close #389; close #401 last. + +**End state**: #389 and #401 CLOSED with linked evidence; the two `docs/_base/` +files document workspace semantics; this PRP file committed (`docs(repo)` +precedent b1c8593). + +## Why + +- Every umbrella #389 success criterion is implemented but **none is ticked + with evidence**, and three of six are only provable by a live multi-preset + dogfood (8-preset green/skip matrix; restore/replay without 409/500; + workspace-tag retrieval). +- E4's dogfood covered `demo_minimal` keep-runs only. The `showcase_rich` + keep-path is the one that exercises E3 tagging (planning phase exists only + there) and the 24-step `created_objects` recording — untested live as a + whole. +- The umbrella explicitly deferred RUNBOOKS/DOMAIN_MODEL documentation to E5; + operators currently have no runbook for replay-of-`reset=true` destructive + semantics, non-unique names, row accumulation (no DELETE), or the + `holiday_rush` union-window replay trap. + +## What + +A verification campaign plus a docs-only repo change. No `app/`, `frontend/`, +or `alembic/` change is in scope. Tracked changes: this PRP file + +`docs/_base/RUNBOOKS.md` + `docs/_base/DOMAIN_MODEL.md`, one branch +(`docs/showcase-workspace-e5-gate`), one PR into `dev`. + +### Success Criteria (mirror of #401 sub-tasks) + +- [ ] Fresh-DB stack built via the **DROP/CREATE DATABASE** procedure (NOT + `down -v` — see Known Gotchas) + `alembic upgrade head` clean. +- [ ] 8/8 preset matrix executed and recorded; every outcome matches the + expected-outcome matrix (Known Gotchas) — zero undocumented ❌. +- [ ] `demo_minimal` keep-run: 1 new workspace row (status `completed`), + listed in **Saved workspaces**, Load restores config + artifacts panel, + Replay completes green with a NEW distinct `workspace_id`. +- [ ] `showcase_rich` keep-run: same as above PLUS `created_objects` carries + `winning_run_id`/`v2_run_id`/`alias`/`scenario_plan_ids`/`batch_id`, and + `GET /scenarios?tags=workspace:` returns ≥1 plan tagged + `["showcase", …, "source:showcase", "workspace:"]`. +- [ ] Legacy frame back-compat re-confirmed: one run WITHOUT workspace fields + behaves as today (no workspace row created for it). +- [ ] `test_demo_replay_same_config_twice` green: targeted local run + CI + citation. +- [ ] RUNBOOKS.md gains the Showcase-workspace section (4 mandated topics); + DOMAIN_MODEL.md gains the aggregate + ubiquitous-language row. +- [ ] Five validation gates green on the docs branch (ruff, format, mypy, + pyright, unit pytest) + targeted frontend vitest for the workspace + components. +- [ ] Evidence on #401; #389 checkboxes ticked + E5 line fixed; #389 closed; + #401 closed. + +## All Needed Context + +### Documentation & References + +```yaml +# ── The gate's contract ────────────────────────────────────────────────────── +- issue: "#401 — gh issue view 401" + why: The epic's six sub-tasks this PRP encodes verbatim. + +- issue: "#389 — gh issue view 389 --json body" + why: "Umbrella. DRIFT (verified 2026-06-12): ALL 11 checkboxes unticked + (5 decomposition + 6 success criteria); the E5 decomposition line still + reads 'not yet created'. Tick every satisfied box, update the E5 line to + '#401', close with a close-out comment. E1-E4 = #390 #391 #392 #393, + all CLOSED, shipped v0.2.22." + +- file: PRPs/PRP-reliability-E6-release-gate.md + why: "The release-gate precedent this PRP mirrors (STOP rule, evidence + format, close-out order). ONE CORRECTION: its 'docker compose down -v' + fresh-stack step is superseded — see the fresh-stack procedure below." + +- file: PRPs/PRP-showcase-workspace-E4-restore-replay.md + why: "Restore-vs-Replay designed semantics (replay is always keep; config + verbatim incl. reset/skip_seed; no provenance column; no DELETE) — the + semantics the RUNBOOKS section must document." + +# ── What 'green' means per preset ──────────────────────────────────────────── +- file: docs/_base/RUNBOOKS.md + why: "'Showcase page (/showcase) pipeline fails at step X' items 1-28. + Entry 28 is THE per-preset expected-outcome matrix (sparse may-fail, + holiday_rush pinned window + union-range trap, others green). Items + 9-26 list acceptable ⏭️/⚠️ on showcase_rich. ALSO the doc-sweep target: + add the new '### Showcase workspace …' section AFTER this incident + section's closing Notes paragraph." + +- file: app/features/demo/pipeline.py + why: "_phase_table(scenario) (line 2528): showcase_rich = 24 steps / 10 + phases (data 7, modeling 2, decision 5, portfolio 1, planning 2, + knowledge 3, verify 1, agents 1, ops 1, cleanup 1); ALL other presets = + the legacy 11-step / 6-phase table. _SCENARIO_SEED_PROFILE (lines + 513-538): showcase_rich/retail_standard/high_variance/stockout_heavy + 5×15×180d, new_launches 5×25×180d, holiday_rush PINNED + 2024-10-01..2024-12-31. READ-ONLY." + +# ── Workspace surface (what the keep-runs must prove) ──────────────────────── +- file: app/features/demo/models.py + why: "showcase_workspace table (line 57): workspace_id String(32) UNIQUE, + status CHECK ∈ {running, completed, failed} (lines 32-34), name + NON-unique, seed/scenario/reset/skip_seed config columns, + store_id/product_id/date_start/date_end grain columns, created_objects + JSONB (sparse keys: winning_run_id, v2_run_id, v2_model_path, alias, + agent_session_id, batch_id, scenario_plan_ids, scenario_artifact_key, + train_model_types, stale_alias_run_id), result_summary JSONB. Soft + references only — NO FKs. This is the DOMAIN_MODEL aggregate source." + +- file: app/features/demo/routes.py + why: "GET /demo/workspaces (lines 70-97; response {workspaces:[…]}, newest + first, limit 1-100/offset), GET /demo/workspaces/{id} (100-125; 404 + problem+json when missing), POST /demo/run (41-67), WS /demo/stream + (128-156; workspace_name without preservation='keep' → one error + event). No DELETE endpoint exists — verified; that's a runbook fact." + +- file: app/features/demo/schemas.py + why: "DemoRunRequest defaults: seed=42, reset=false, skip_seed=true, + scenario='demo_minimal', preservation='ephemeral', workspace_name=None; + workspace_name pattern ^[a-z0-9][a-z0-9\\-_]*$, ≤100 chars. ScenarioPreset + = the 8 enum values. WorkspaceListItem vs WorkspaceDetailResponse + (detail adds grain/window + created_objects)." + +- file: tests/test_e2e_demo.py + why: "test_demo_replay_same_config_twice (line ~561, @pytest.mark.integration): + POSTs the IDENTICAL keep-body (seed 42, reset=true, skip_seed=false, + demo_minimal, workspace_name='replay-regression') twice against a + SUBPROCESS uvicorn on :8124; asserts both pass, distinct workspace_ids, + both listed completed. The #146/#324 regression guard. NOTE: it RESETS + the DB — never run it mid-dogfood." + +# ── Frontend dogfood surface ───────────────────────────────────────────────── +- file: frontend/src/pages/showcase.tsx + why: "Controls: scenario card grid (8 presets), 'Re-seed first' → + skip_seed=false, 'Reset database' → reset=true, seed input, 'Save as + workspace' checkbox (line 332) + name input (344, mirrors the backend + pattern), Run/Stop. handleReplayWorkspace (174-186) re-submits the + recorded config VERBATIM with preservation='keep' (+ recorded name). + Starting any run detaches a loaded workspace (140)." + +- file: frontend/src/components/demo/WorkspacePanel.tsx + why: "'Saved workspaces' panel — Load (restore config + artifacts, no run) + and Replay buttons per row; reset=true rows render a destructive-styled + marker (line ~38/94). Vitest: WorkspacePanel.test.tsx, + WorkspaceArtifactsPanel.test.tsx, RunHistoryStrip.test.tsx, + ScenarioPicker.test.tsx." + +- file: frontend/src/components/demo/ScenarioPicker.tsx + why: "The 8 preset cards with wall-clock estimates; sparse carries + caveatKind='expected-skip' ('May fail at features/backtest (NaN WAPE) — + expected; see runbook', line ~66-72)." + +# ── Doc-sweep targets ──────────────────────────────────────────────────────── +- file: docs/_base/DOMAIN_MODEL.md + why: "Sweep target. Add '### showcase_workspace (Demo)' under Core + Aggregates (mirror the scenario_plan entry's shape: Root / JSONB fields + / Invariants), one Ubiquitous Language row ('workspace' vs seeder + 'scenario' vs 'scenario plan'), and one Entity Relationship Summary + line (soft-references, no FK)." + +- file: docs/_base/API_CONTRACTS.md + why: "READ-ONLY here — E4 already documented the workspace endpoints + WS + fields (commit ee844f1). Cross-check the docs sweep against it; do NOT + duplicate endpoint tables into RUNBOOKS." + +# ── Close-out mechanics ────────────────────────────────────────────────────── +- file: .claude/rules/umbrella-issue.md + why: "Write discipline for gh mutations: dry-run echo → idempotent check → + approval gate → confirm. Applies to the #389 body edit + closes." + +- file: .claude/rules/output-formatting.md + why: "Evidence-comment format: emoji status indicators, box separators, + ≤40 lines." +``` + +### Current Codebase tree (verification-relevant subset) + +```bash +app/features/demo/models.py # showcase_workspace ORM (E1) +app/features/demo/pipeline.py # _phase_table + _SCENARIO_SEED_PROFILE +app/features/demo/routes.py # /demo/run, /demo/workspaces[,/{id}], WS +app/features/demo/schemas.py # DemoRunRequest, ScenarioPreset, Workspace* +app/features/demo/tests/ # test_workspace.py, test_routes.py, test_pipeline.py +tests/test_e2e_demo.py # test_demo_replay_same_config_twice (:561) +frontend/src/pages/showcase.tsx # dogfood entry point +frontend/src/components/demo/ # WorkspacePanel, ScenarioPicker, … (+ vitest) +docs/_base/RUNBOOKS.md # sweep target 1 (zero 'workspace' today) +docs/_base/DOMAIN_MODEL.md # sweep target 2 (zero 'workspace' today) +docker-compose.gpu.yml # GPU overlay — REQUIRED for ollama legs +docker-compose.lan.yml # untracked local overlay — NOT used here +``` + +### Desired Codebase tree (files added/modified) + +```bash +PRPs/PRP-showcase-workspace-E5-release-gate.md # ADD — this file +docs/_base/RUNBOOKS.md # MOD — +'### Showcase workspace' section +docs/_base/DOMAIN_MODEL.md # MOD — +aggregate, +UL row, +ER line +# No app/, frontend/, or alembic/ change is in scope. +``` + +### Known Gotchas & Environment Quirks + +```python +# ── STOP RULE (governs the whole epic) ─────────────────────────────────────── +# If ANY preset run or workspace check deviates from the expected-outcome +# matrix below: capture evidence (step table / screenshot / response body), +# open a NEW fix issue referencing #389 + #401, comment the failure on #401, +# and STOP the close-out. The docs sweep (Task 7) still lands — it documents +# already-shipped E1-E4 semantics and is independent of dogfood outcomes. +# A DOCUMENTED expected-fail (sparse) or sanctioned ⏭️/⚠️ is NOT a deviation. + +# ── Fresh stack — SUPERSEDES the reliability-E6 procedure ──────────────────── +# NEVER `docker compose down -v`: it removes ALL named volumes incl. +# forecastlab_ollama_models (pulled gemma4/qwen3 models, expensive to rebuild). +# Fresh-DB equivalent (memory: fresh-stack-gate-procedure, hit 2026-06-12): +# docker compose --profile gpu down --remove-orphans +# docker compose -f docker-compose.yml -f docker-compose.gpu.yml --profile gpu up -d +# docker compose exec -T postgres psql -U forecastlab -d postgres \ +# -c "DROP DATABASE IF EXISTS forecastlab WITH (FORCE);" \ +# -c "CREATE DATABASE forecastlab OWNER forecastlab;" +# uv run alembic upgrade head # cold-boot proof on the empty DB +# GOTCHA: WITHOUT the gpu overlay, ollama runs CPU-only and the showcase +# rag_index_subset step HARD-FAILS (probe says reachable=True but the cold +# qwen3-embedding:4b load exceeds the 60s embedding ReadTimeout → 502). +# Verify `docker exec forecastlab-ollama nvidia-smi` works, then WARM the +# embedder before any showcase_rich run (~41s cold-on-GPU, ~2.4s warm): +# curl -s localhost:11434/api/embed -d '{"model":"qwen3-embedding:4b","input":"warmup"}' +# GOTCHA: the fresh DB wipes app_config runtime overrides — agent model +# reverts to .env (agent_default_model=ollama:gemma4-agent on this host). +# Re-check GET /config/ai after boot. +# GOTCHA: a stale uvicorn from a prior session can hold :8123 — curl then hits +# OLD code. lsof -iTCP:8123 -sTCP:LISTEN and kill stale PIDs first. +# Run the backend as LOCAL uvicorn from the REPO ROOT (host-filesystem +# artifacts for verify/feature-metadata; docs/ visible to rag_index_subset — +# the compose backend image lacks docs/, which is why docker-compose.lan.yml +# exists; do NOT use that overlay here). pnpm 11 depsStatusCheck can stall +# `pnpm dev` — start Vite directly: cd frontend && ./node_modules/.bin/vite --host 0.0.0.0 + +# ── Per-preset expected-outcome matrix (RUNBOOKS entry 28 — the gate's spec) ─ +# Every run: 'Re-seed first' TICKED (skip_seed=false). seed=42. +# demo_minimal 11 steps GREEN (this run = the demo_minimal keep-run) +# retail_standard 11 steps GREEN +# high_variance 11 steps GREEN +# stockout_heavy 11 steps GREEN +# new_launches 11 steps GREEN +# sparse 11 steps GREEN **or documented FAIL** at features/ +# backtest (50% missing grains / all-NaN WAPE gate) — +# the card carries the expected-skip badge; either +# outcome = matrix-conformant; record which occurred +# holiday_rush 11 steps GREEN — tick **Reset database** TOO (pinned +# 2024-10-01..12-31 window; re-seed without reset ADDS +# rows → /seeder/status reports the union range) +# showcase_rich 24 steps / 10 phases GREEN — run LAST, tick **Reset +# database** TOO (clears holiday_rush's pinned window so +# the 180d today-anchored window seeds clean; also clears +# accumulated model_run rows). This run = the +# showcase_rich keep-run. +# ACCEPTABLE non-green steps on showcase_rich (RUNBOOKS items 9-26): +# agent_hitl_flow ⏭️ (KNOWN on this host: gemma4-agent 2B reliably skips — +# no Approve button appears; memory showcase-crypto-randomuuid-lan-crash), +# rag_index_subset / rag_retrieve_probe ⏭️ (provider unreachable/rejected — +# should NOT happen with the GPU overlay + warm-up; investigate if hit), +# verify ⏭️ (V2 prophet_like winner — artifact roots differ), +# champion_compat_compare / safer_promote_flow ⏭️ (missing V1/V2 — should +# NOT happen with Re-seed; investigate), batch_preset ⚠️ (90s poll timeout), +# ops_snapshot ⚠️. ANY other ❌/⏭️ = deviation → STOP RULE. +# Only ONE pipeline at a time (module asyncio.Lock; 2nd start → one error +# event / 409; Stop releases the lock in ~5s). Budget: ~90s-3min per 11-step +# run, 5-10 min showcase_rich; whole matrix ~25-40 min. + +# ── Workspace-mode mechanics ───────────────────────────────────────────────── +# workspace_name pattern ^[a-z0-9][a-z0-9\-_]*$ (lowercase!) ≤100 — use +# e5-gate-minimal / e5-gate-rich. 'Save as workspace' + name without the +# checkbox is impossible in the UI; over raw WS, workspace_name without +# preservation='keep' → one error event (negative probe, optional). +# Replay re-submits reset/skip_seed VERBATIM: replaying a reset=true row IS +# DESTRUCTIVE (wipes + reseeds) — that's designed semantics (E4) and a +# mandated RUNBOOKS topic, not a bug. Names are NON-unique by design; every +# replay creates a NEW row. Rows accumulate; there is NO DELETE endpoint. +# localStorage run-history ('forecastlab.showcase.runs.v1', FIFO 5) EXCLUDES +# workspace runs — keep-runs appear only in the server-backed panel. +# GET /scenarios?tags=workspace: