diff --git a/.gitignore b/.gitignore index f7432f39..062e8379 100644 --- a/.gitignore +++ b/.gitignore @@ -48,3 +48,6 @@ HANDOFF.md # Local CI / dogfood logs and screenshots (per-session, never committed) .ci-logs/ docs/manual_hun/ + +# Understand-Anything knowledge-graph generator state (local-only, multi-MB) +.understand-anything/ diff --git a/PRPs/PRP-flow-pack-E3-flow-umbrella.md b/PRPs/PRP-flow-pack-E3-flow-umbrella.md new file mode 100644 index 00000000..4a388fd1 --- /dev/null +++ b/PRPs/PRP-flow-pack-E3-flow-umbrella.md @@ -0,0 +1,577 @@ +name: "PRP — flow-pack E3 (/flow-umbrella: rule-driven umbrella issue creation)" +description: | + E3 parallel epic of the flow: command-suite integration. Lands the tracked + docs/flow-pack/commands/flow-umbrella.md template + the local .claude runtime install. + The command generates a 7-field umbrella body from the approved V2 ship list, echoes a + dry-run, waits for approval, and creates the umbrella GitHub issue. Parallel with E2/E4; + depends only on E1 (foundation/labels/milestone) being merged. + + + +## Issue links +- Umbrella: **#368** — feat(repo): integrate flow-pack methodology as the flow: command suite +- This epic: **#372** — flow-pack E3 — /flow-umbrella (7-field umbrella body, approval-gated write) +- Milestone: **#1 flow-pack-suite** · labels: `epic`, `flow` +- Depends on: **E1 #369** (merged via PR #370) — labels/milestone/tracked-docs foundation + +--- + +## Goal + +Implement the **E3 /flow-umbrella command**: the tracked template plus its local runtime install +that generates and creates a GitHub umbrella issue from the approved V2 ship list. The end state: +a user runs `/flow-umbrella ` (after `/flow-brainstorm` has produced a V2 list), +inspects the dry-run echo of the full 7-field body, types "approve," and the umbrella issue is +created with `umbrella` + `flow` + type labels and the active milestone attached. + +**Deliverable:** 2 files (tracked template + local runtime install). No epic creation. No +milestone/label creation (must exist from E1). No commit/push. Parallel with E2 (#371) and E4 +(#373). + +## Why + +- The flow: suite's Stage 2 ("Decompose") has no umbrella-creation command yet. + `/flow-brainstorm` produces a V2 ship list but has nowhere to hand it off. +- E3 fills that gap: it wires the approved V2 list into the 7-field umbrella contract + (`umbrella-issue.md`) and materializes it as a GitHub issue that `/flow-epics` can then + decompose into child epics. +- The durable-source split (`.claude/` gitignored) requires a tracked template. Without E3's + `docs/flow-pack/commands/flow-umbrella.md`, a fresh clone loses the command. + +## What + +A docs-first delivery: tracked command template → byte-copy local install → working `/flow-umbrella`. + +### Success Criteria +- [ ] `docs/flow-pack/commands/flow-umbrella.md` exists (tracked canonical spec/template). +- [ ] `.claude/commands/flow/flow-umbrella.md` present, byte-identical to the tracked template + (confirmed by `diff -q`). +- [ ] `git check-ignore .claude/commands/flow/flow-umbrella.md` prints the path (confirms it's + gitignored — not the durable artifact). +- [ ] Fresh-clone recovery works: `cp docs/flow-pack/commands/*.md .claude/commands/flow/` + reproduces the local command. +- [ ] `/flow-umbrella` validates prerequisites (labels/milestone) before drafting. +- [ ] `/flow-umbrella` performs idempotency check before dry-run. +- [ ] `/flow-umbrella` dry-run echoes the full `gh issue create` command + body before any write. +- [ ] `/flow-umbrella` approval gate prevents write on any response other than "approve." +- [ ] The created umbrella issue carries all 7 required sections, labels `umbrella`+`flow`+type, + and the active milestone. +- [ ] Every created artifact carries a provenance header linking to its source. +- [ ] E2/E4/E5 NOT implemented here. No epic creation, no sub-issue linking. + +--- + +## All Needed Context + +### Documentation & References +```yaml +# THE PATTERN TO MIRROR — read these before writing anything +- file: PRPs/PRP-flow-pack-E1-foundation.md + why: The E3 PRP's sibling; mirror its exact PRP structure. This implementation follows the + same two-file pattern (tracked template + local install). Read every section heading. + +- file: docs/flow-pack/commands/flow-prime.md + why: The only existing flow: command. Mirror its file structure exactly: + frontmatter YAML (description:) → provenance HTML comment → # Title → ## Objective → + ## Process (numbered steps with bash blocks) → ## Output Format (fenced block) → + ## Arguments ($ARGUMENTS line). flow-umbrella.md must follow this layout. + +- file: docs/flow-pack-methodology.md + sections: + - "## Stage 2 — Decompose — /flow-umbrella" (steps 1–4 + next-pointer spec) + - "## Umbrella contract (7-field body)" (exact field names + content rules) + - "## Durable-source split" (table: tracked vs local vs purpose) + - "## Fresh-clone recovery" (the cp command + diff check) + why: The authoritative spec for what /flow-umbrella does and the 7-field contract. + +- file: .claude/rules/umbrella-issue.md + sections: + - "## Umbrella body — 7-field contract" (field table with rules) + - "## Write discipline" (dry-run echo / idempotent check / approval gate / rate-delay) + - "## Labels and milestone" (required labels + milestone policy) + - "## Source-of-truth split (CRITICAL)" (the durable vs local split + recovery) + why: The agent contract for umbrella creation; flow-umbrella.md must implement every clause. + +# LIVE UMBRELLA EXAMPLE (for 7-field body reference) +- bash: "gh issue view 368 --json body --jq '.body'" + why: A real 7-field umbrella body created for this exact project. Use as a reference for + tone, section depth, and the "not yet created" placeholder pattern in Decomposition. + +# WORKING STATE +- file: .flow/brainstorm-log.md + why: Contains the V2 ship list, defer list with reasons, and the 5-dim scores that + /flow-umbrella reads to synthesize the 7-field body. Shows V2 items that became + the E1–E5 epics of #368. + +- file: .flow/state.md + sections: "FLOW-PRIME:YOU-ARE-HERE" marker block + why: Contains the active milestone name, label status, and current branch/version that + /flow-umbrella uses to validate prerequisites. + +# HAND-OFF SPEC +- file: docs/flow-pack-methodology.md + section: "## Stage 3 — Execute (delegated)" and the FLAI mapping table rows for + "Umbrella issue creation (01)" and "Epic creation + linking (01)" + why: Confirms /flow-umbrella ends with "→ Next: /flow-epics #N" and that epic creation + belongs entirely to /flow-epics (E4 #373). Do not blur the boundary. +``` + +### Current Codebase tree (relevant slice) +```bash +docs/ + flow-pack-methodology.md # ✅ tracked; § Stage 2 = /flow-umbrella spec + flow-pack/ + commands/ + flow-prime.md # ✅ tracked; MIRROR this structure + # flow-umbrella.md does NOT exist yet — to create +.claude/ + commands/flow/ + flow-prime.md # ✅ local install (gitignored); byte-copy of tracked + # flow-umbrella.md does NOT exist yet — to create + rules/ + umbrella-issue.md # ✅ local rule; contains 7-field contract + write discipline +.flow/ + state.md # working state; has You-Are-Here with milestone/labels + brainstorm-log.md # V2 ship list + defer list +PRPs/ + PRP-flow-pack-E1-foundation.md # ✅ the sibling PRP — mirror its layout + PRP-flow-pack-E3-flow-umbrella.md # this PRP +``` + +### Desired Codebase tree (files to add + responsibility) +```bash +docs/ + flow-pack/ + commands/ + flow-umbrella.md # TRACKED durable template/spec for /flow-umbrella + # Contains: frontmatter, provenance, full 9-step process, + # output format, $ARGUMENTS spec. + # Source of truth for the command. +.claude/ + commands/flow/ + flow-umbrella.md # LOCAL install — byte-copy of the tracked template (gitignored). + # Claude Code reads this when the user types /flow-umbrella. + # Recovery: cp docs/flow-pack/commands/flow-umbrella.md .claude/commands/flow/ +``` + +### Known Gotchas & Quirks +```text +# CRITICAL: .claude/ is gitignored (confirmed: /.claude and .claude in .gitignore). +# The local command at .claude/commands/flow/flow-umbrella.md is NEVER the durable +# artifact. Durable truth = docs/flow-pack/commands/flow-umbrella.md. The same +# gitignore split from E1 applies here — never treat the local copy as the source of truth. + +# CRITICAL: Local install must be byte-identical to tracked template. Verify: +# diff -q docs/flow-pack/commands/flow-umbrella.md \ +# .claude/commands/flow/flow-umbrella.md && echo "OK no drift" +# If they drift, the tracked template wins. Recovery = cp. + +# CRITICAL: The 7-field body must contain ALL 7 sections with their exact headings: +# ## Summary / ## Approach / ## Decomposition / ## Out of scope (explicit) / +# ## Success criteria / ## Risks / ## Tracking +# A body missing any section = not done (umbrella-issue.md invariant). + +# CRITICAL: Write discipline order — the command MUST do these in sequence: +# 1. prerequisites check (labels + milestone exist) ← fail fast before any draft +# 2. idempotency check (existing issue title search) ← skip create if already exists +# 3. draft 7-field body +# 4. dry-run echo (full body + gh command) +# 5. approval gate (block on user input) +# 6. execute gh issue create (only on "approve") +# 7. confirm (gh issue view) +# 8. print gate + next pointer +# Never swap 1 and 2. Never skip 4 or 5. + +# CRITICAL: gh issue create --body with multi-line content. +# Use --body-file (NOT --body "...") to avoid shell quoting issues +# with multi-line markdown bodies. Pattern: +# cat > /tmp/umbrella-body.md << 'BODY_EOF' +# [body content] +# BODY_EOF +# gh issue create --title "..." --body-file /tmp/umbrella-body.md --label ... --milestone ... +# gh CLI --body-file has been available since gh v1.x; verified against E1 groundwork. + +# CRITICAL: Type label derivation. +# The title follows conventional-commit format: "feat(repo): ". +# The type label is the first token before "(" — default "feat" if ambiguous. +# The type label must exist in the repo (e.g., gh label list | grep feat). +# Umbrella-issue.md says: "Labels ⊇ umbrella label set (plus the `epic` label)" for epics, +# but for the UMBRELLA issue itself: umbrella + flow + type label. + +# GOTCHA: Decomposition section — epic #N refs. +# When /flow-umbrella runs, epic issues don't exist yet. Use PROPOSED descriptions +# with "(not yet created)" suffixes. Pattern from live #368: +# - [ ] **E1 — Foundation** (blocks all): — not yet created +# - [ ] **E2 — Parallel**: — not yet created +# Do NOT put fake "#N" refs for unborn issues. /flow-epics will assign real numbers. + +# GOTCHA: Idempotency check searches open issues only (--state open). +# A closed umbrella with the same title won't block creation. This is intentional: +# a closed umbrella = finished initiative; a new one may legitimately start. + +# GOTCHA: Milestone name matching in gh CLI. +# gh issue create --milestone "" requires the EXACT milestone title string +# (case-sensitive). Always read the milestone name from .flow/state.md You-Are-Here +# or from `gh api repos/{owner}/{repo}/milestones --jq '.[0].title'` rather than +# hard-coding it. + +# GOTCHA: commit-format.md requires every commit to reference an open issue. +# If a commit is needed (authorized by user), reference #372 — that's E3's issue. +# Branch = feat/flow-pack-e3-flow-umbrella. But NO commit/push happens in this PRP. + +# SCOPE: Do NOT create flow-brainstorm, flow-epics, or any other command here. +# E3 ships /flow-umbrella only. E2 (#371) and E4 (#373) are separate parallel epics. +# E5 is the release gate and remains deferred. +``` + +--- + +## Implementation Blueprint + +### list of tasks (dependency order) +```yaml +Task 1 — CREATE docs/flow-pack/commands/flow-umbrella.md (tracked canonical template): + - MIRROR structure of: docs/flow-pack/commands/flow-prime.md + (frontmatter YAML → provenance HTML comment → # Title → ## Objective → ## Process + (9 numbered steps, bash blocks) → ## Output Format (fenced block) → ## Arguments) + - INCLUDE frontmatter: "description: Generate and create umbrella GitHub issue from V2 ship list" + - INCLUDE provenance header naming: docs/flow-pack-methodology.md (§ Stage 2), umbrella-issue.md + - SPEC the 9-step process (see § Per-task notes below for full content) + - INCLUDE the output format block (gate + next-command pointer) + - INCLUDE "$ARGUMENTS" line (initiative description; derived from brainstorm log if omitted) + - HEADER: provenance comment → docs/flow-pack-methodology.md, .claude/rules/umbrella-issue.md + +Task 2 — INSTALL .claude/commands/flow/flow-umbrella.md (local runtime copy): + - GENERATE as a byte-copy: + cp docs/flow-pack/commands/flow-umbrella.md .claude/commands/flow/flow-umbrella.md + - VERIFY no drift: + diff -q docs/flow-pack/commands/flow-umbrella.md .claude/commands/flow/flow-umbrella.md + && echo "OK no drift" + - DO NOT hand-edit the local copy — copy only. + +Task 3 — VERIFY the durable-source split: + - git check-ignore .claude/commands/flow/flow-umbrella.md (must print the path) + - diff -q docs/flow-pack/commands/flow-umbrella.md .claude/commands/flow/flow-umbrella.md + - git status --short (only docs/flow-pack/commands/flow-umbrella.md should be a new tracked file) + - Confirm .claude/commands/flow/flow-umbrella.md is NOT staged. + +Task 4 — VERIFY fresh-clone recovery (optional, proves robustness): + - Simulate: rm -f .claude/commands/flow/flow-umbrella.md + - Regenerate: cp docs/flow-pack/commands/*.md .claude/commands/flow/ + - Confirm: diff -q docs/flow-pack/commands/flow-umbrella.md .claude/commands/flow/flow-umbrella.md +``` + +### Per-task notes — full /flow-umbrella command content spec + +**Task 1 is the high-value task.** The command file must contain the following process, verbatim +or faithful to this spec. Each step maps to a clause in `umbrella-issue.md` § Write discipline. + +```text +## Process + +### 1. Read context + - Load .flow/brainstorm-log.md: extract V2 ship list, defer list (with reasons), initiative title. + - Load .flow/state.md FLOW-PRIME:YOU-ARE-HERE block: extract milestone name, type label, branch. + - $ARGUMENTS overrides the initiative description if provided. + - If .flow/brainstorm-log.md is missing or has no V2 section, print: + "ERROR: No V2 ship list found. Run /flow-brainstorm first, then re-run /flow-umbrella." + and stop. + +### 2. Validate prerequisites + Commands to run: + gh label list --json name --jq '[.[].name]' # check for umbrella, flow, + gh api repos/{owner}/{repo}/milestones \ + --jq '.[] | select(.state=="open") | .title' # check milestone exists + If any required label or milestone is missing: + print the exact gh label create / gh milestone create commands to remediate. + STOP — do not proceed to draft. + +### 3. Idempotency check + Command: + gh issue list --state open \ + --search "" \ + --json number,title \ + --jq '.[0] // empty' + If an open issue with the same title exists: + print "Umbrella #N already exists: — skipping create." + jump to step 8 (print gate + next-pointer with the existing number). + +### 4. Draft the 7-field body + Synthesize from context (step 1). ALL 7 sections required: + + ## Summary + + + ## Approach + + + ## Decomposition + Phase taxonomy (invariant from docs/flow-pack-methodology.md § Invariants): + - Exactly ONE Foundation epic (blocks all others) + - N Parallel epics (run concurrently after Foundation) + - Exactly ONE Release-gate epic (closes ONLY after Foundation + all Parallel) + Format per entry: + - [ ] **EN — ** (): — not yet created + Use "not yet created" because /flow-epics hasn't run. Do NOT invent fake #N refs. + + ## Out of scope (explicit) + + Each line must end with " — reason: ". + Never leave a blank reason (umbrella-issue.md invariant: every defer has a reason). + + ## Success criteria + Checkbox list. Each criterion must be independently verifiable by an outside reviewer: + - [ ] + + ## Risks + Markdown table. One row per risk, one mitigation per row: + | Risk | Mitigation | + |------|------------| + | | | + + ## Tracking + - Source of truth: `docs/flow-pack-methodology.md` + working state `.flow/state.md` + - Milestone: + - **One-pass confidence: X/10** () + +### 5. Dry-run echo + Print the EXACT commands to be executed (do not run them yet): + cat > /tmp/umbrella-body.md << 'BODY_EOF' + [full 7-field body as it will be submitted — show every line] + BODY_EOF + gh issue create \ + --title "" \ + --body-file /tmp/umbrella-body.md \ + --label "umbrella" --label "flow" --label "<type>" \ + --milestone "<milestone-name>" + +### 6. Approval gate + Print: + "────────────────────────────────────────── + Awaiting approval. Type 'approve' to create the umbrella issue. + Any other response = abort (no write). + ──────────────────────────────────────────" + Wait for user input. On "approve" (case-insensitive): proceed to step 7. + On anything else: print "Aborted — no issue created." and stop. + +### 7. Execute + Run: + cat > /tmp/umbrella-body.md << 'BODY_EOF' + [7-field body] + BODY_EOF + gh issue create \ + --title "<title>" \ + --body-file /tmp/umbrella-body.md \ + --label "umbrella" --label "flow" --label "<type>" \ + --milestone "<milestone-name>" + Capture the issue URL; extract the issue number N from the URL. + +### 8. Confirm + Run: + gh issue view <N> --json number,title,labels,milestone \ + --jq '"#\(.number): \(.title) [\(.labels | map(.name) | join(","))] milestone=\(.milestone.title // "none")"' + If labels or milestone are missing, print remediation: + gh issue edit <N> --add-label <missing-label> + gh issue edit <N> --milestone "<milestone-name>" + +### 9. Gate and next-command + Gate ✅ UMBRELLA CREATED when: created + all 7 sections present + labels ✅ + milestone ✅ + Gate ❌ FAILED when: gh issue create returned non-zero or confirm shows missing labels/milestone. + Always print the next-command pointer, even on failure (so the user knows where to go): + → Next: /flow-epics #<N> +``` + +**Output format block to include at the end of the command file:** +```text +## Output Format + +━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ + 🏗️ flow-umbrella: Umbrella Issue +━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ + +📋 Context + V2 ship items: N | Defer items: M + Initiative: <title> + Milestone: <name> | Labels: umbrella ✅ flow ✅ <type> ✅ + +📋 Prerequisite check + umbrella label: [✅/❌] | flow label: [✅/❌] | <type> label: [✅/❌] + Milestone <name>: [✅/❌] | Existing umbrella: [#N / none] + +📋 Dry-run + cat > /tmp/umbrella-body.md ... + gh issue create --title "..." --body-file /tmp/umbrella-body.md --label ... --milestone ... + [full body printed] + +──────────────────────────────────────────── + Awaiting approval. Type "approve" to create. +──────────────────────────────────────────── + +[After approval:] + +📋 Created + ✅ gh issue create → #N: <title> + Labels: umbrella, flow, <type> | Milestone: <name> + +──────────────────────────────────────────── + [✅/❌] UMBRELLA CREATED → #N +──────────────────────────────────────────── + +→ Next: /flow-epics #N +``` + +### Integration Points +```yaml +DOCS (tracked): + - add: docs/flow-pack/commands/flow-umbrella.md + - no modifications to docs/flow-pack-methodology.md required + (it already describes /flow-umbrella in § Stage 2; no new content needed) + +CLAUDE (local, gitignored): + - add: .claude/commands/flow/flow-umbrella.md (byte-copy via cp) + +HAND-OFF: + - /flow-umbrella ends with "→ Next: /flow-epics #N" + - /flow-epics is E4 (#373); /flow-umbrella does NOT call it — it only prints the pointer + - base_prp:prp-create is invoked per epic AFTER /flow-epics, not by /flow-umbrella +``` + +--- + +## Validation Loop + +### Level 1: File presence + durable-source split +```bash +# tracked source of truth exists +test -f docs/flow-pack/commands/flow-umbrella.md && echo "OK tracked" + +# local install exists and is gitignored (NOT durable) +test -f .claude/commands/flow/flow-umbrella.md && echo "OK local" +git check-ignore .claude/commands/flow/flow-umbrella.md # must print the path + +# local install == tracked template (no drift) +diff -q docs/flow-pack/commands/flow-umbrella.md \ + .claude/commands/flow/flow-umbrella.md && echo "OK no drift" + +# only docs/flow-pack/commands/flow-umbrella.md is a new tracked file; .claude/** not staged +git status --short +``` + +Expected output: +- `OK tracked`, `OK local`, path printed by gitignore check, `OK no drift` +- `git status` shows one new `A` entry: `docs/flow-pack/commands/flow-umbrella.md` +- `.claude/commands/flow/flow-umbrella.md` does NOT appear in `git status` + +### Level 2: Fresh-clone recovery reproduction +```bash +# simulate recovery: blow away the local install, regenerate, confirm identical +rm -f .claude/commands/flow/flow-umbrella.md +cp docs/flow-pack/commands/*.md .claude/commands/flow/ +diff -q docs/flow-pack/commands/flow-umbrella.md \ + .claude/commands/flow/flow-umbrella.md && echo "OK recovery reproduces local" +``` + +### Level 3: Command structure smoke (manual inspection) +```bash +# Confirm all 9 process steps are present in the tracked template +grep -c "^### [0-9]\." docs/flow-pack/commands/flow-umbrella.md +# Expected: 9 + +# Confirm all 7 body section headings appear in the spec +grep "## Summary\|## Approach\|## Decomposition\|## Out of scope\|## Success criteria\|## Risks\|## Tracking" \ + docs/flow-pack/commands/flow-umbrella.md | wc -l +# Expected: 7 + +# Confirm dry-run and approval-gate keywords present +grep -c "dry.run\|DRY RUN\|approve\|approval" docs/flow-pack/commands/flow-umbrella.md +# Expected: >= 4 (dry-run in step 5, approval in step 6) + +# Confirm the next-command pointer spec is present +grep "flow-epics" docs/flow-pack/commands/flow-umbrella.md +# Expected: at least one match showing "→ Next: /flow-epics #N" +``` + +### Level 4: Interactive smoke (post-install, manual) +```text +# In a Claude Code session, type: /flow-umbrella test-initiative +# Verify the command: +# - Reads .flow/brainstorm-log.md (or notes its absence with a helpful error) +# - Runs prerequisite checks (labels + milestone) and reports results +# - Performs idempotency check +# - Prints a dry-run echo with full 7-field body +# - Blocks on approval gate (does NOT create without "approve") +# - On "approve": creates the issue and prints #N + gate result +# - Ends with "→ Next: /flow-epics #N" +# (No automated assertion — interactive command; verify output sections manually.) +``` + +--- + +## Tests / checks required +- [ ] Level 1 file-presence + gitignore + no-drift checks all pass (4 assertions green). +- [ ] Level 2 recovery reproduces the local install byte-for-byte. +- [ ] Level 3 structure smoke: 9 steps present, 7 section headings, ≥4 dry-run/approval hits, + flow-epics pointer present. +- [ ] Provenance header present in both created files + (`grep -l "provenance" docs/flow-pack/commands/flow-umbrella.md .claude/commands/flow/flow-umbrella.md`). +- [ ] `git status --short` shows `docs/flow-pack/commands/flow-umbrella.md` as the ONLY new tracked + file; `.claude/commands/flow/flow-umbrella.md` does NOT appear. +- [ ] No standard repo gate is broken (markdown-only change → ruff/mypy/pyright/pytest unaffected; + run `uv run ruff check . && uv run pytest -v -m "not integration"` to confirm green). +- [ ] `docs/flow-pack/commands/flow-prime.md` structure matches `flow-umbrella.md` structure + (frontmatter → provenance → title → Objective → Process → Output Format → Arguments). +- [ ] E2/E4/E5 NOT implemented; no GitHub issues created. + +--- + +## Final Validation Checklist +- [ ] Both files created + byte-identical (diff clean). +- [ ] Durable-source split holds: `docs/flow-pack/commands/flow-umbrella.md` tracked; `.claude/commands/flow/flow-umbrella.md` gitignored + regenerable. +- [ ] Command spec contains all 9 process steps in the correct order (prereq → idempotent → draft → dry-run → approval → execute → confirm → gate → next-pointer). +- [ ] 7-field body headings all present in the spec with correct names and field rules. +- [ ] Write discipline clauses all present: dry-run echo, approval gate, idempotency check, no write without "approve." +- [ ] --body-file approach used (not --body "...") for multi-line body safety. +- [ ] Type label derivation documented in the command spec. +- [ ] "→ Next: /flow-epics #N" pointer present in output format. +- [ ] E2 (#371) and E4 (#373) NOT touched; no epic creation in this scope. +- [ ] No commit/push performed; `uv.lock` + `docker-compose.lan.yml` left untouched. +- [ ] Branch for implementation: `feat/flow-pack-e3-flow-umbrella` off `dev`; commit (when user authorizes) references `(#372)`. + +--- + +## Anti-Patterns to Avoid +- ❌ Treating `.claude/commands/flow/` as the source of truth (it's gitignored — durable truth is `docs/flow-pack/commands/flow-umbrella.md`). +- ❌ Hand-editing the local install instead of copying from the tracked template. +- ❌ Swapping the write-discipline order (prerequisite check must come before idempotency check, which must come before dry-run, which must come before execute). +- ❌ Using `--body "..."` instead of `--body-file` for multi-line gh issue create bodies. +- ❌ Putting fake `#N` refs for unborn epic issues in the Decomposition section. +- ❌ Creating epic issues inside /flow-umbrella — that is /flow-epics (E4 #373). +- ❌ Creating milestone or labels inside /flow-umbrella — those must exist from E1; the command validates and fails fast if missing. +- ❌ Diverging from the `docs/flow-pack/commands/flow-prime.md` file structure (frontmatter, section headings, output-format block, Arguments line). +- ❌ Staging `uv.lock` / `docker-compose.lan.yml` (pre-existing dirty worktree — leave alone). +- ❌ Implementing any part of E2 (/flow-brainstorm), E4 (/flow-epics), or E5 here. + +--- + +## Confidence Score: 9/10 + +One-pass likelihood is high: +- Pattern is fully established by the E1 PRP + flow-prime.md (mirror exactly). +- All 9 process steps are specified to the line level, including exact `gh` commands. +- Methodology is fully reverse-engineered and dogfooded (`.flow/` state docs). +- Work is markdown-only: zero Python/TS runtime risk, no type system, no DB. +- Durable-source split is already understood by the E1 precedent. + +−1 for two authoring judgment calls: (a) the quality of the synthesized 7-field body (depends on +what the V2 ship list says — must be coherent and match the live umbrella #368 style), and (b) +correctly handling the case where `.flow/brainstorm-log.md` is absent or partially formed (the +command must degrade gracefully with a clear error + `→ /flow-brainstorm` pointer). diff --git a/PRPs/PRP-flow-pack-E4-flow-epics.md b/PRPs/PRP-flow-pack-E4-flow-epics.md new file mode 100644 index 00000000..77b9b720 --- /dev/null +++ b/PRPs/PRP-flow-pack-E4-flow-epics.md @@ -0,0 +1,594 @@ +name: "PRP — flow-pack E4 /flow-epics (tracked template + local install + epic decomposition + sub-issue linking)" +description: | + E4 of the flow: command-suite integration. Ships the /flow-epics command: reads an umbrella + issue decomposition, creates phase-ordered epic issues with idempotent guards, links them as + sub-issues via the GitHub REST API, and hands off to base_prp:prp-create per epic. + Parallel epic — runs after E1 (#369 merged). Docs-only: no app/backend/frontend changes. + +<!-- provenance: docs/flow-pack-methodology.md § "/flow-epics — epic issues" + § "Epic contract" + + § "Hierarchy-as-data (REST API)" + § "Durable-source split". + Epic scope: GitHub issue #373 body. + Structural pattern: PRPs/PRP-flow-pack-E1-foundation.md --> + +## Issue links +- Umbrella: **#368** — feat(repo): integrate flow-pack methodology as the flow: command suite +- This epic: **#373** — flow-pack E4 — /flow-epics (epic decomposition, sub-issue linking, prp-create handoff) +- Milestone: **#1 flow-pack-suite** · labels: `epic`, `flow` + +--- + +## Goal + +Implement the **E4 deliverable** of the `flow:` command suite: the `/flow-epics` command. + +End state: a user can run `/flow-epics 368` (or `/flow-epics #368`) and the command will: +1. Read umbrella #368's decomposition section to build an epic inventory +2. Check which epics already exist (title search) and which are already linked (GraphQL) +3. For each epic not yet a sub-issue: dry-run echo → user "approve" → `gh issue create` → rate-delay → `gh api` sub-issue link +4. Skip E5 Release gate (deferred per #373 scope) +5. Verify the final sub-issue list via GraphQL +6. Print gate result + per-epic handoff to `base_prp:prp-create` + +**Deliverable:** 2 files + 1 documented recovery path (see Desired tree). No E2/E3/E5 behavior. +No app/backend/frontend code changes. No commit/push without explicit user authorization. + +## Why + +- The `flow:` suite's value ends at `/flow-umbrella` unless epics exist as linked sub-issues. + `/flow-epics` closes the loop: umbrella → epics → base_prp:prp-create per epic. +- Hierarchy-as-data (REST sub-issue API) is required so project-board grouping, closure rollup, + and dependency ordering work natively. Body `#N` mentions alone are documentation, not data. +- E4 is parallel with E2/E3; it can ship independently once E1 is merged. +- The write-discipline invariants (dry-run/idempotent/approval/rate-delay/confirm) are already + fully specified in `docs/flow-pack-methodology.md` and `.claude/rules/umbrella-issue.md` — + this PRP operationalises them into a reusable slash-command. + +## What + +A docs-first deliverable: tracked canonical template → local runtime install → working `/flow-epics`. + +### Success Criteria +- [ ] Tracked `docs/flow-pack/commands/flow-epics.md` exists with a complete, self-contained + command spec (all 8 required sections — see Task 1). +- [ ] Local `.claude/commands/flow/flow-epics.md` present, byte-regenerable from the tracked template. +- [ ] Fresh-clone recovery documented and verified: + `cp docs/flow-pack/commands/*.md .claude/commands/flow/` reproduces both command files. +- [ ] Running `/flow-epics 368` shows a correct inventory (E1 exists/linked, E2–E4 exist, E5 deferred) + without writing anything until the user approves. +- [ ] Every dry-run echo shows the exact `gh issue create` + `gh api sub_issues POST` commands before + any execution. +- [ ] All created/linked epics carry labels `epic` + `flow` + `feat` + milestone `flow-pack-suite`. +- [ ] Sub-issue links verified via GraphQL after each write batch. +- [ ] No GitHub write without explicit user "approve" for each write operation. +- [ ] E5 Release gate shown as ⏭️ SKIP (deferred) — never auto-created. +- [ ] No app/backend/frontend code touched; `uv.lock` / `docker-compose.lan.yml` / uncommitted + `flow-prime.md` left untouched. + +## All Needed Context + +### Documentation & References +```yaml +# SOURCE OF TRUTH — read these first +- file: docs/flow-pack-methodology.md + section: > + "/flow-epics — epic issues" (Step 1–4 of Stage 2 — Decompose), + "Epic contract" (phase blockquote + Purpose + Sub-tasks), + "Hierarchy-as-data (REST API)" (exact gh api calls), + "Durable-source split" (tracked vs gitignored) + why: > + Authoritative spec; exact gh api patterns; epic body contract; + portability invariants the command must uphold + +- file: .claude/rules/umbrella-issue.md + section: > + "Epic body — phase contract" (blockquote templates), + "Hierarchy-as-data" (POST endpoint, no native gh cmd), + "Write discipline" (5-step dry-run/idempotent/approval/rate-delay/confirm), + "Labels and milestone" (label superset rule) + why: Rule-level contract the command encodes; exact write discipline steps + +# PATTERN TO MIRROR — match structure exactly +- file: docs/flow-pack/commands/flow-prime.md + why: > + The only other tracked command template; mirror its YAML frontmatter, + HTML provenance comment, section headings, $ARGUMENTS convention, + inline !`bash` commands, and output-format block word-for-word in style + +# LIVE UMBRELLA BEING SERVED +- issue: "#368" + command: "gh issue view 368 --json number,title,body,labels,milestone" + why: > + Live umbrella whose Decomposition section defines which epics to create; + its current sub-issues list is the ground truth for idempotency + +# EXISTING EPICS (idempotency ground truth) +- issue: "#369 E1 Foundation — merged/linked" +- issue: "#371 E2 Parallel — /flow-brainstorm" +- issue: "#372 E3 Parallel — /flow-umbrella" +- issue: "#373 E4 Parallel — /flow-epics (this epic)" + why: All four exist; command must detect them and skip create, only link if not yet a sub-issue + +# E1 PRP STRUCTURAL REFERENCE +- file: PRPs/PRP-flow-pack-E1-foundation.md + why: > + Exact PRP structure pattern — frontmatter, issue links, goal/why/what, + context YAML, current+desired trees, Known Gotchas block, task list format, + integration points YAML, validation levels, final checklist, confidence score + +# CONSTRAINTS +- file: CLAUDE.md + section: "Learnings — .claude/ is gitignored" + critical: > + Local install is NOT the durable artifact. docs/flow-pack/** is tracked. + .claude/commands/flow/flow-epics.md must be gitignored (verify with git check-ignore). + +- file: .claude/rules/output-formatting.md + why: > + Emoji status indicators (✅ ❌ ⏭️ ⚠️ 🔄) + box-drawing separators (━━━/────) + the command's printed output must match exactly + +- file: .claude/rules/commit-format.md + why: Branch name (feat/flow-pack-e4-flow-epics) + commit format (feat(docs,repo): ... (#373)) +``` + +### Current Codebase tree (relevant slice) +```bash +docs/ + flow-pack-methodology.md # TRACKED — source-of-truth spec + flow-pack/ + commands/ + flow-prime.md # TRACKED — structural pattern to mirror (MODIFIED unstaged) +.claude/ + commands/flow/ + flow-prime.md # LOCAL — byte-copy of tracked template + rules/ + umbrella-issue.md # LOCAL — write-discipline + sub-issue API contract +PRPs/ + PRP-flow-pack-E1-foundation.md # REFERENCE — PRP structural pattern + PRP-flow-pack-E4-flow-epics.md # THIS PRP (being executed) +``` + +### Desired Codebase tree (files to add + responsibility) +```bash +docs/ + flow-pack/ + commands/ + flow-epics.md # TRACKED — canonical template/spec for /flow-epics + # source of truth; the committed, portable contract +.claude/ + commands/flow/ + flow-epics.md # LOCAL install — regenerable byte-copy of the tracked + # template (gitignored; NOT durable) +``` + +### Known Gotchas & Quirks +```text +# CRITICAL: No native `gh` sub-issue command exists (cli/cli#10298). ALWAYS use: +# gh api repos/w7-mgfcode/ForecastLabAI/issues/{umbrella_N}/sub_issues \ +# -X POST -F sub_issue_id={epic_N} \ +# --header "GitHub-Next-Preview: true" +# The --header "GitHub-Next-Preview: true" is REQUIRED for the REST POST write. +# It is NOT required for GraphQL read queries. + +# CRITICAL: Idempotent check BEFORE create — search by exact issue title: +# gh issue list --search "<exact title>" --json number,title \ +# --jq '.[0].number // "none"' +# If result != "none" → issue already exists → skip create, proceed to link check only. +# GitHub search is fuzzy — verify the returned title matches character-for-character. + +# CRITICAL: Idempotent link check — fetch current sub-issues BEFORE any POST: +# gh api graphql -f query=' +# { repository(owner:"w7-mgfcode", name:"ForecastLabAI") { +# issue(number: N) { subIssues(first: 20) { nodes { number title } } } +# } }' +# If epic_N already appears in subIssues.nodes → skip POST (already linked). + +# GOTCHA: sleep 1 (rate-delay) between consecutive gh api WRITE calls — mandatory, not advisory. +# This applies between (create + link) pairs AND between two consecutive creates. +# Do NOT sleep between a read and a write, only between consecutive writes. + +# GOTCHA: Epic title naming convention — must match the umbrella decomposition text exactly so +# idempotent search finds existing issues: +# Pattern from live E1–E4: "feat(repo): flow-pack <phase-label> — <scope description>" +# Example: "feat(repo): flow-pack E5 — release gate (end-to-end dogfood + portability manifest)" + +# GOTCHA: Label superset rule — epic labels must include ALL umbrella labels minus "umbrella", +# plus "epic". Umbrella #368 has: umbrella, flow → epic labels: epic, flow, feat. +# Read umbrella labels from: gh issue view <N> --json labels --jq '[.labels[].name]' +# Then remove "umbrella" from the list and add "epic". + +# GOTCHA: docs/flow-pack/commands/flow-prime.md is MODIFIED (not staged) in the current worktree. +# Run `git diff docs/flow-pack/commands/flow-prime.md` before branching to see the delta. +# Do NOT stage or commit that file as part of this PRP — it belongs to a separate fix. +# Only stage docs/flow-pack/commands/flow-epics.md (the new tracked file). + +# SCOPE BOUNDARY — E5 is DEFERRED: +# Issue #373 Out-of-scope: "Release-gate epic (E5) — deferred until E2–E4 are implemented." +# When the command encounters the Release-gate line in the umbrella decomposition, it MUST +# show it as ⏭️ SKIP (deferred) and never auto-create it. This is a hard scope boundary. + +# SCOPE BOUNDARY — command creates epics, nothing else: +# - Does NOT author PRP content (→ base_prp:prp-create) +# - Does NOT create sub-tasks within epics (→ issue-to-subtasks) +# - Does NOT implement any epic's feature code +# - Does NOT run any validation gates (ruff/mypy/pytest) — markdown-only change +``` + +## Implementation Blueprint + +### Tasks (dependency order) + +```yaml +Task 1 — CREATE docs/flow-pack/commands/flow-epics.md (tracked canonical template): + + MIRROR structure of: docs/flow-pack/commands/flow-prime.md + The file must contain ALL 8 sections in this exact order: + + ── Section A: YAML frontmatter ── + --- + description: Create phase-ordered epic issues from an umbrella decomposition, link via REST + sub-issues API, and hand off to base_prp:prp-create per epic + --- + + ── Section B: HTML provenance comment ── + <!-- provenance: docs/flow-pack-methodology.md §"/flow-epics — epic issues" + §"Epic contract" + + §"Hierarchy-as-data (REST API)". Source of truth for .claude/commands/flow/flow-epics.md. + Recovery: cp docs/flow-pack/commands/flow-epics.md .claude/commands/flow/flow-epics.md + Full methodology: docs/flow-pack-methodology.md --> + + ── Section C: Title + Objective ── + # flow-epics: Epic Decomposition + ## Objective + One-paragraph prose: "Read the umbrella issue's Decomposition section, create N phase-ordered + epic issues (Foundation → Parallel → Release gate) with idempotent guards, link each as a + sub-issue of the umbrella via the GitHub REST API, and hand off to base_prp:prp-create per + open epic. Skips epics that already exist and/or are already linked." + + ── Section D: Arguments ── + ## Arguments + "$ARGUMENTS — umbrella issue number (e.g. 368 or #368). Required. + If omitted, reads the active umbrella number from the 'In-progress issues' block in + .flow/state.md (looks for the [umbrella,flow]-labeled entry)." + + ── Section E: Process (7 numbered steps with inline !`bash` commands) ── + ## Process + + ### 1. Parse argument + Strip '#' prefix from $ARGUMENTS if present. + If empty: read .flow/state.md and find the [umbrella,flow] open issue number. + !`cat .flow/state.md | grep "umbrella,flow" | head -1` + + ### 2. Fetch umbrella + !`gh issue view <N> --json number,title,body,labels,milestone` + Abort with ❌ if not found or if the `umbrella` label is absent. + + ### 3. Extract decomposition + Parse the body: find lines between "## Decomposition" and the next "##" heading. + For each bullet line: + - Detect phase marker: "Foundation" / "Parallel" / "Release gate" (from the bold label) + - Extract scope description (used to construct the epic title and Purpose) + - Detect any embedded "#N" ref already in the line (pre-existing issue pointer) + - Flag as SKIP if the line contains "(deferred)" OR the phase is Release gate + AND the scope mentions "not yet created" or "deferred" + + ### 4. Pre-flight checks + !`gh label list --json name --jq '[.[].name] | contains(["epic","flow","feat"])'` + Abort with ❌ if result is false (missing labels). + !`gh api repos/w7-mgfcode/ForecastLabAI/milestones --jq '.[].title'` + Abort with ❌ if "flow-pack-suite" not present. + + ### 5. Idempotent inventory + Fetch current sub-issues of the umbrella: + !`gh api graphql -f query=' + { repository(owner:"w7-mgfcode", name:"ForecastLabAI") { + issue(number: <N>) { + subIssues(first: 20) { nodes { number title state } } + } + } }'` + + For each epic in the decomposition (except SKIP items): + a. Search for existing issue by title: + !`gh issue list --search "<exact epic title>" --json number,title \ + --jq '.[0] | "\(.number // "none") \(.title // "")"'` + → Record as EXISTS #M / NOT_FOUND + b. Check if already in the GraphQL subIssues list above → LINKED / UNLINKED + + Print inventory table: + [phase] [title-summary] [exists] [linked] + Foundation E1 foundation ... ✅ #369 ✅ linked + Parallel E2 /flow-brainstorm ... ✅ #371 ✅/❌ ? + Parallel E3 /flow-umbrella ... ✅ #372 ✅/❌ ? + Parallel E4 /flow-epics ... ✅ #373 ✅/❌ ? + Release gate E5 dogfood ... ⏭️ deferred — + + ### 6. Create + link loop (write-discipline, per epic not yet linked) + For each epic where LINKED=false AND phase != SKIP: + + 6a. If NOT_FOUND → CREATE the issue: + Compose body using the epic body template (see § Epic body template). + Echo dry-run: + ┌─ DRY-RUN ──────────────────────────────────────────────┐ + │ gh issue create \ │ + │ --title "<title>" \ │ + │ --body "<body>" \ │ + │ --label epic --label flow --label feat \ │ + │ --milestone "flow-pack-suite" │ + └────────────────────────────────────────────────────────┘ + APPROVAL GATE: "Type 'approve' to create, anything else to skip." + If approved: execute gh issue create; capture issue number M from output. + RATE-DELAY: sleep 1 + + 6b. Link to umbrella (whether newly created or already existing but UNLINKED): + Echo dry-run: + ┌─ DRY-RUN ──────────────────────────────────────────────┐ + │ gh api repos/w7-mgfcode/ForecastLabAI/issues/<N>/sub_issues \│ + │ -X POST -F sub_issue_id=<M> \ │ + │ --header "GitHub-Next-Preview: true" │ + └────────────────────────────────────────────────────────┘ + APPROVAL GATE: "Type 'approve' to link, anything else to skip." + If approved: execute gh api POST; confirm: + !`gh issue view <M> --json number,title,labels` + RATE-DELAY: sleep 1 + + ### 7. Verify + gate + Re-fetch sub-issues via GraphQL (same query as step 5) to confirm final state. + Print gate result + handoff (see § Output format). + + ── Section F: Epic body template ── + ## Epic body template + + Use one of these three templates based on phase. Fill <angle-bracket> placeholders. + + FOUNDATION epic: + ``` + > Sub-issue of #<umbrella_N> (umbrella: <umbrella_title>). Foundation — blocks Epics #<P1>, #<P2>, … + + ## Purpose + + <One-paragraph scope description extracted from the umbrella decomposition line.> + + ## Sub-tasks + + _To be decomposed via `issue-to-subtasks` when this epic is picked up._ + ``` + + PARALLEL epic: + ``` + > Sub-issue of #<umbrella_N> (umbrella: <umbrella_title>). Parallel after Foundation (E1 #<foundation_N>). + + ## Purpose + + <One-paragraph scope description.> + + ## Sub-tasks + + _To be decomposed via `issue-to-subtasks` when this epic is picked up._ + ``` + + RELEASE GATE epic: + ``` + > Sub-issue of #<umbrella_N> (umbrella: <umbrella_title>). Release gate — closes only after Foundation + all Parallel epics close. + + ## Purpose + + <One-paragraph scope description.> + + ## Sub-tasks + + _To be decomposed via `issue-to-subtasks` when this epic is picked up._ + ``` + + ── Section G: Output format ── + ## Output format + + ``` + ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ + 🔗 flow-epics: Epic Decomposition + ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ + + 📋 Umbrella #<N> — <title> + Phase structure: 1 Foundation · M Parallel · 1 Release gate (deferred) + + 📋 Epic inventory + [phase] [exists] [linked] + ✅ #369 E1 Foundation exists linked + ✅ #371 E2 Parallel exists ❌ not linked + ✅ #372 E3 Parallel exists ❌ not linked + ✅ #373 E4 Parallel exists ❌ not linked + ⏭️ E5 Release gate deferred — + + 📋 Dry-run: writes pending (awaiting approval) + [dry-run block per epic needing a link] + + 📋 Actions taken (after approvals) + ✅ #371 linked under #<N> + ✅ #372 linked under #<N> + ✅ #373 linked under #<N> + ⏭️ E5 deferred — skipped + + ──────────────────────────────────────────── + ✅ EPICS LINKED — 4/5 epics under #<N> (E5 deferred) + ──────────────────────────────────────────── + + → Next: base_prp:prp-create per open epic: + - /base_prp:prp-create (#371 — /flow-brainstorm) + - /base_prp:prp-create (#372 — /flow-umbrella) + (E4 #373 = this PRP, currently executing) + ``` + + ── Section H: Reuse-map (match umbrella-issue.md style) ── + ## Reuse-map + + | Need | Tool | + |------|------| + | Codebase + context priming | core_piv_loop:prime | + | Epic → 5 executable subtasks | issue-to-subtasks skill | + | PRP authoring per epic | base_prp:prp-create | + | Session continuity | writing-session-handoffs | + | Rules audit | audit-rules-drift | + | Umbrella creation | /flow-umbrella | + + +Task 2 — INSTALL .claude/commands/flow/flow-epics.md (local runtime copy): + + REGENERATE as a byte-copy: cp docs/flow-pack/commands/flow-epics.md .claude/commands/flow/ + + VERIFY no drift: + diff -q docs/flow-pack/commands/flow-epics.md .claude/commands/flow/flow-epics.md + → must produce no output (identical files) + + CONFIRM gitignored: + git check-ignore .claude/commands/flow/flow-epics.md + → must print the path (confirms it is ignored and will never appear in git status as tracked) +``` + +### Integration Points +```yaml +DOCS (tracked — the only committed change): + - add: docs/flow-pack/commands/flow-epics.md + +CLAUDE (local, gitignored — never staged or committed): + - add: .claude/commands/flow/flow-epics.md + +DIRTY WORKTREE (handle carefully): + - INSPECT before branching: git diff docs/flow-pack/commands/flow-prime.md + - DO NOT STAGE: uv.lock, docker-compose.lan.yml, docs/flow-pack/commands/flow-prime.md + - IF flow-prime.md modification is a bug fix relevant to this work, commit it separately first + +BRANCH: + - create off dev: feat/flow-pack-e4-flow-epics + - verify: git branch --show-current + +COMMIT (only when user explicitly authorizes — no auto-commit): + - format: feat(docs,repo): flow-pack E4 — /flow-epics command template + local install (#373) + - stage only: docs/flow-pack/commands/flow-epics.md + - .claude/commands/flow/flow-epics.md is gitignored — it will NOT appear in git status + - verify pre-commit hook passes: .claude/hooks/check-commit-format.sh + +METHODOLOGY DOC (no changes needed): + - docs/flow-pack-methodology.md already documents /flow-epics fully — do NOT edit it +``` + +## Validation Loop + +### Level 1: File presence + durable-source split +```bash +# tracked source of truth exists and has content +test -f docs/flow-pack/commands/flow-epics.md \ + && wc -l docs/flow-pack/commands/flow-epics.md \ + && echo "OK: tracked file present" + +# local install exists +test -f .claude/commands/flow/flow-epics.md && echo "OK: local install present" + +# local install is gitignored (CRITICAL — must print the file path, not empty) +git check-ignore .claude/commands/flow/flow-epics.md +# expected output: .claude/commands/flow/flow-epics.md + +# local install == tracked template (no drift) +diff -q docs/flow-pack/commands/flow-epics.md .claude/commands/flow/flow-epics.md \ + && echo "OK: no drift between tracked and local" + +# only docs/flow-pack/commands/flow-epics.md is a new tracked addition +# .claude/** must NOT appear as staged or tracked +git status --short +# expected: A docs/flow-pack/commands/flow-epics.md (and possibly M flow-prime.md unstaged) +``` + +### Level 2: Fresh-clone recovery reproduction +```bash +# simulate a fresh-clone by removing the local install, then regenerate +rm -f .claude/commands/flow/flow-epics.md +cp docs/flow-pack/commands/*.md .claude/commands/flow/ + +# verify recovery reproduces the file byte-for-byte +diff -q docs/flow-pack/commands/flow-epics.md .claude/commands/flow/flow-epics.md \ + && echo "OK: recovery reproduces local install" + +# both flow-prime and flow-epics should be present after cp +ls .claude/commands/flow/ +# expected: flow-epics.md flow-prime.md +``` + +### Level 3: Smoke test — dry-run against live umbrella #368 +```bash +# In a Claude Code session, invoke: +# /flow-epics 368 +# +# Verify the printed output contains ALL of the following (no gh writes yet): +# ✅ Inventory table is shown with correct issue numbers for E1–E4 +# ✅ E5 Release gate shown as ⏭️ SKIP (deferred) — NOT created, NOT in dry-run queue +# ✅ Any UNLINKED epics (E2/E3/E4) shown in the dry-run pending section +# ✅ Dry-run block echoes the exact gh api POST command with --header "GitHub-Next-Preview: true" +# ✅ "Type 'approve' to link" gate appears before any write executes +# ✅ Command ends with "→ Next: base_prp:prp-create" for each open epic +# +# This is interactive verification — no automated assertion. +# After confirming the dry-run, optionally approve the link operations for E2/E3/E4. +``` + +## Tests / checks required +- [ ] Level 1: file-presence + gitignore + no-drift — all assertions pass. +- [ ] Level 2: recovery reproduces local install from tracked template. +- [ ] Level 3: `/flow-epics 368` dry-run shows correct inventory; E5 shows ⏭️ SKIP. +- [ ] `docs/flow-pack/commands/flow-epics.md` contains provenance HTML comment: + `grep "provenance:" docs/flow-pack/commands/flow-epics.md` +- [ ] All 8 required sections present in the command spec: + `grep -E "^## (Objective|Arguments|Process|Epic body template|Output format|Reuse-map)" docs/flow-pack/commands/flow-epics.md | wc -l` + → must print 6 (+ frontmatter and provenance = 8 total) +- [ ] Sub-issue link command uses `gh api ... --header "GitHub-Next-Preview: true"`: + `grep "GitHub-Next-Preview" docs/flow-pack/commands/flow-epics.md` +- [ ] Idempotent check uses `--jq '.[0].number // "none"'`: + `grep '"none"' docs/flow-pack/commands/flow-epics.md` +- [ ] No mention of E2/E3/E5 implementation logic in flow-epics.md — E5 is only shown as SKIP. +- [ ] Standard repo gates unaffected (markdown-only change): + `uv run ruff check . && uv run mypy app/ && uv run pytest -v -m "not integration"` + → all must be green (unchanged by this PRP) + +## Final Validation Checklist +- [ ] 2 files created: `docs/flow-pack/commands/flow-epics.md` (tracked) and + `.claude/commands/flow/flow-epics.md` (local, gitignored). +- [ ] Durable-source split holds: docs tracked, .claude ignored + regenerable from `cp`. +- [ ] Command spec is self-contained: an agent reading only `docs/flow-pack/commands/flow-epics.md` + and codebase can implement `/flow-epics` correctly without additional context. +- [ ] All write-discipline invariants encoded in the Process section: + dry-run echo → idempotent check → approval gate → rate-delay → confirm. +- [ ] Sub-issue REST API used correctly: + `gh api ... -X POST -F sub_issue_id=N --header "GitHub-Next-Preview: true"` for writes, + GraphQL for reads. +- [ ] E5 Release gate hard-coded as ⏭️ SKIP (never auto-created). +- [ ] E2/E3 behavior not included in flow-epics.md. +- [ ] Branch is `feat/flow-pack-e4-flow-epics` off `dev`; commit references `(#373)`. +- [ ] No commit/push performed by this PRP execution unless explicitly requested by the user. +- [ ] `uv.lock` + `docker-compose.lan.yml` + unstaged `flow-prime.md` left untouched. +- [ ] Provenance header present in `docs/flow-pack/commands/flow-epics.md`. + +## Anti-Patterns to Avoid +- ❌ Using `gh` CLI native sub-issue support or any undocumented extension — `gh api POST` directly. +- ❌ Omitting `--header "GitHub-Next-Preview: true"` from the sub-issue POST call. +- ❌ Skipping the dry-run echo before any `gh` write. +- ❌ Auto-proceeding after dry-run without waiting for "approve". +- ❌ Auto-creating E5 (Release gate) — explicitly deferred per #373 scope. +- ❌ Writing app/backend/frontend code (this is a docs-only PRP). +- ❌ Staging `uv.lock` / `docker-compose.lan.yml` / `flow-prime.md`. +- ❌ Treating `.claude/commands/flow/flow-epics.md` as the committed source of truth. +- ❌ Letting the local install drift from the tracked template after writing. +- ❌ Implementing E2 (/flow-brainstorm) or E3 (/flow-umbrella) behavior here. + +--- + +## Confidence Score: 7/10 + +One-pass likelihood is moderate-to-high: methodology is fully documented, `gh api` patterns are +verified in the umbrella rule and methodology doc, and this is markdown-only (no runtime/type +risk). −3 for: + +1. **Decomposition parser complexity** — the umbrella body uses varied text patterns (bold labels, + embedded `#N` refs, "(deferred)" markers, "not yet created" prose); the implementing agent + must handle all variants without getting any text extraction subtly wrong. +2. **Two-phase idempotency** — `exists?` + `linked?` are independent checks that must compose + correctly (four states: exists+linked, exists+unlinked, not-exists, deferred). Getting one + case wrong silently skips a link or attempts a duplicate create. +3. **Authoring a ~150-line markdown spec** — the command file is long and the implementing agent + must maintain section ordering, template exactness (epic blockquote wording), and style + consistency with `flow-prime.md` throughout. diff --git a/PRPs/flow-brainstorm.md b/PRPs/flow-brainstorm.md new file mode 100644 index 00000000..2060abc2 --- /dev/null +++ b/PRPs/flow-brainstorm.md @@ -0,0 +1,704 @@ +name: "flow-pack E2 — /flow-brainstorm command" +description: | + Implement the /flow-brainstorm command: the V1-naive-plan → 3-read-only-agent-research → + 5-dimensional-score → V2-ship/defer pipeline for the flow-pack methodology suite. + Delivers two files (tracked template + local install); no backend, frontend, + migration, or runtime changes. + +**Issue:** #371 | **Umbrella:** #368 | **Branch:** `feat/flow-brainstorm-command` +**Depends on:** E1 #369 merged (flow-prime live, `docs/flow-pack/commands/` exists, labels/milestone created). +**Working-tree caveat:** `docker-compose.lan.yml` (untracked) + `uv.lock` (M) are pre-existing — do NOT stage either. + +--- + +## Goal + +Implement `/flow-brainstorm` as E2 of the flow-pack suite. Deliverables are two files: + +| File | Action | Role | +|------|--------|------| +| `docs/flow-pack/commands/flow-brainstorm.md` | CREATE | Tracked canonical template — committed, source of truth | +| `.claude/commands/flow/flow-brainstorm.md` | CREATE | Local runtime install — gitignored, byte-copy of tracked template | + +No `app/`, `frontend/`, `alembic/`, or any runtime code is touched. No GitHub writes. +No E3 (`/flow-umbrella`) or E4 (`/flow-epics`) behavior is implemented. + +### Success Criteria + +- [ ] `docs/flow-pack/commands/flow-brainstorm.md` exists, committed under `docs(repo): add /flow-brainstorm command — E2 of flow-pack suite (#371)` +- [ ] `.claude/commands/flow/flow-brainstorm.md` is a byte-copy (`diff` exits 0) +- [ ] Command file follows the exact structure of `docs/flow-pack/commands/flow-prime.md` (frontmatter, provenance comment, numbered steps, output format block, $ARGUMENTS section) +- [ ] All 5 scoring dimensions defined: Value, Risk, Readiness, Complexity, Evidence (1–10 each, max 50) +- [ ] Score bands enforced: ≥ 40 → SHIP, 36–39 → NEGOTIATE (human gate), < 36 → DEFER with written reason +- [ ] Exactly 3 read-only subagents: A (Known Issues), B (Best Practices), C (Dependencies) +- [ ] `.flow/brainstorm-log.md` append rules documented (append-only, round-numbered) +- [ ] Human approval gate printed before next-command pointer +- [ ] Next-command pointer: `→ Next: /flow-umbrella <initiative>` +- [ ] Only `docs/flow-pack/commands/flow-brainstorm.md` committed; `.claude/` never staged + +--- + +## Why + +The flow-pack pipeline (`docs/flow-pack-methodology.md`) is a 4-command chain: + +``` +/flow-prime → /flow-brainstorm → /flow-umbrella → /flow-epics +``` + +E1 (flow-prime, PR #370) is merged. E2 delivers the second link: it turns a baseline snapshot +into a scored, human-approved V2 ship/defer list ready for `/flow-umbrella`. Without it, the +pipeline breaks at the first handoff — a user can prime but cannot plan. + +The command is pure tooling: no application code, no database changes, no new runtime +dependencies. It encodes the "V1 → critique → 3-agent research → 5-dim score → V2" pattern +defined in `docs/flow-pack-methodology.md § /flow-brainstorm`. + +--- + +## What + +### Behavior summary (what the command does when invoked) + +1. Reads initiative description from `$ARGUMENTS` or falls back to `.flow/state.md` "Gap" +2. Produces **V1** — flat bullet list of 5–10 items, from baseline alone, unscored, labeled "V1" +3. Applies **critique gate** — tags each V1 item with `{assumption, scope-creep, no-evidence}`; does NOT modify V1 +4. Spawns **exactly 3 read-only research subagents** (Agent tool) in parallel: + - Agent A — Known Issues: open bugs and prior incidents relevant to V1 + - Agent B — Best Practices: docs, existing skills, reuse candidates + - Agent C — Dependencies: blockers, upstream availability, API confirmation +5. **Scores** every V1 item on 5 dimensions (1–10 each, max 50) +6. Applies **score-band rule**: ≥ 40 SHIP · 36–39 NEGOTIATE (stop for human) · < 36 DEFER +7. Waits for human approval on negotiate-zone items before constructing V2 +8. Produces **V2**: ship list + defer list with explicit one-clause reasons + X/10 confidence +9. **Appends** decision trail (V1, scores, defer reasons) to `.flow/brainstorm-log.md` +10. Waits for **human approval gate** on the V2 list +11. Prints gate result + `→ Next: /flow-umbrella <initiative>` + +### What the command does NOT do + +- Does not create GitHub issues (E3 /flow-umbrella) +- Does not generate 7-field umbrella bodies (E3) +- Does not link sub-issues (E4 /flow-epics) +- Does not write to `.flow/state.md` (that is owned by /flow-prime) +- Does not make any GitHub writes before explicit human approval + +--- + +## All Needed Context + +### Documentation & References + +```yaml +- file: docs/flow-pack-methodology.md + section: "§ /flow-brainstorm — V1 → score → V2" and "§ Invariants" + why: The AUTHORITATIVE spec. Contains the 5 dimensions, score bands, 3-agent mandates, + and invariants. Read this section before writing the command — the PRP quotes the + key facts but the methodology doc is the single source of truth. + +- file: docs/flow-pack/commands/flow-prime.md + why: The CANONICAL PATTERN for flow: command files. Mirror its structure exactly: + YAML frontmatter (description:), HTML provenance comment block, ## heading, + ## Objective paragraph, ## Process numbered steps, !-prefix bash commands, + ## Output Format fenced block, ## Arguments section. Do not invent structure. + +- file: .flow/brainstorm-log.md + why: The EXISTING append log (created during E1 dogfood). Shows the exact format to + replicate: provenance comment on creation, ## Round N — YYYY-MM-DD heading, + V1 bullets, critique flags, 5-dim score notation "Value/Risk/Readiness/Complexity/Evidence", + SHIP/NEGOTIATE/DEFER notation, user-response line. The command must append in exactly + this format. + +- file: .flow/state.md + why: Input to the command at runtime. "You are here" and "Gap" sections provide initiative + context when $ARGUMENTS is absent. Also holds the phase status the command should + update to "[x] Phase N — /flow-brainstorm done" (see Update rules below). + +- file: .claude/rules/umbrella-issue.md + why: Downstream contract — shows what /flow-umbrella expects as input from /flow-brainstorm. + Confirms the V2 ship list is the ONLY durable output; V1 + scores are transient + working-state artifacts (invariant: "V1 is transient"). +``` + +### Desired codebase tree + +``` +docs/flow-pack/commands/ + flow-prime.md ← existing (pattern reference, DO NOT modify) + flow-brainstorm.md ← CREATE (tracked canonical template) + +.claude/commands/flow/ + flow-prime.md ← existing + flow-brainstorm.md ← CREATE (byte-copy of tracked template) + +.flow/ ← existing (runtime working dir, NOT committed) + state.md ← existing (input at runtime) + brainstorm-log.md ← existing (append target at runtime) +``` + +### Known Gotchas + +``` +# GOTCHA 1: .claude/ is gitignored — NEVER commit the local install +# docs/flow-pack/commands/flow-brainstorm.md = durable source of truth (commit this) +# .claude/commands/flow/flow-brainstorm.md = regenerable local install (do NOT commit) +# Source: docs/flow-pack-methodology.md § "Durable-source split" +# Recovery line: cp docs/flow-pack/commands/flow-brainstorm.md .claude/commands/flow/flow-brainstorm.md +# This MUST appear in the provenance comment of the command file. + +# GOTCHA 2: Subagents are PROSE instructions, NOT ! bash commands +# The 3 research subagents are invoked via the Agent tool, described as prose in the +# command file: "Spawn 3 read-only research subagents in parallel (Agent tool): ..." +# They are NOT `!`-prefixed lines. A `!` prefix runs bash; Agent tool invocations +# are instructional prose that Claude follows when executing the command. +# Example from flow-prime.md: !`git log -5 --oneline` is bash. +# "Spawn Agent A (Known Issues) with the following prompt: ..." is agent invocation prose. + +# GOTCHA 3: brainstorm-log.md is APPEND-ONLY (NOT HTML markers like state.md) +# state.md uses <!-- FLOW-PRIME:...:START/END --> marker pairs (replacement strategy). +# brainstorm-log.md is different — each run appends a NEW ## Round N section. +# NEVER overwrite or replace previous rounds. +# Update rule in the command file must say: +# - File absent → create with provenance header + ## Round 1 section +# - File exists → count current max N, append ## Round (N+1) — <date> + +# GOTCHA 4: Score-band NEGOTIATE (36–39) requires a HARD STOP +# Items in 36–39 MUST be surfaced to the human before constructing V2. +# The command must STOP and present the negotiate items with their scores and +# one-sentence rationale, asking the human to decide: ship or defer. +# Only after the human responds does V2 get constructed. +# Do NOT auto-ship negotiate items. Per methodology § Invariants: "Score bands are hard." + +# GOTCHA 5: Every DEFER item MUST have an explicit one-clause written reason +# Per methodology § Invariants: "Every defer has a reason. A defer item with no +# written reason is a process failure." +# Format: "<item title> (score: X/50): DEFER — <one clause reason>" +# Acceptable reason: "DEFER — overlaps existing analyzing-ai-repos skill; revisit if a +# future initiative needs deep reverse-engineering." +# NOT acceptable: "DEFER" alone, or "DEFER — not needed now" + +# GOTCHA 6: V1 must be explicitly labeled "V1" and UNSCORED +# Per methodology: "labeled 'V1' explicitly" and the item list is "unscored". +# Do not add dimension scores in V1. V1 is the raw brainstorm before research. +# The critique gate TAGS items (flags) but does not score them. + +# GOTCHA 7: Critique flags are LABELS, NOT FIXES +# The critique gate attaches zero or more flags to each V1 item: +# assumption — relies on an unverified fact +# scope-creep — touches E3/E4/E5 or out-of-scope systems +# no-evidence — no concrete codebase grounding for the need +# The flags focus research agents but do NOT change V1 text. + +# GOTCHA 8: $ARGUMENTS fallback chain +# 1. If $ARGUMENTS is non-empty, use it as the initiative description. +# 2. Else, read .flow/state.md "Gap" line. +# 3. Else, ask the user: "What initiative should I brainstorm? (1–3 sentences)" + +# GOTCHA 9: The command file IS the agent instruction — no code runs +# Unlike a Python module or TypeScript file, the command file is read by Claude Code +# and followed as instructions. "Implementation" = writing the markdown correctly. +# The PRP's task is to specify the exact content of that markdown file. +``` + +--- + +## Implementation Blueprint + +There are two tasks: (1) write the tracked template file, (2) create the local install copy. Task 3 is the commit. + +--- + +### Task 1: Write `docs/flow-pack/commands/flow-brainstorm.md` + +Write this exact file. Every section is specified below. Mirror the structure of +`docs/flow-pack/commands/flow-prime.md` — do not invent new structural conventions. + +--- + +**File content spec** (write verbatim, substituting `<YYYY-MM-DD>` with today): + +```markdown +--- +description: V1 naive plan → 3-read-only-agent research → 5-dim score → V2 ship/defer list +--- + +<!-- provenance: flow-pack methodology stage 2 (V1 → V2 planning pipeline). + Source of truth: docs/flow-pack/commands/flow-brainstorm.md (tracked). + Local install: .claude/commands/flow/flow-brainstorm.md (gitignored, regenerable from this file). + Recovery: cp docs/flow-pack/commands/flow-brainstorm.md .claude/commands/flow/flow-brainstorm.md + Full methodology: docs/flow-pack-methodology.md --> + +# flow-brainstorm: V1 → Score → V2 + +## Objective + +Turn a baseline initiative description into a scored, human-approved V2 ship/defer list ready +for `/flow-umbrella`. Produces three outputs: + +1. **V1** — flat bullet list of 5–10 candidate items, from baseline alone, unscored, labeled "V1". +2. **V2** — approved ship list + explicit defer list + X/10 one-pass confidence score. +3. **Log entry** — full decision trail appended to `.flow/brainstorm-log.md`. + +The three read-only research subagents are the engine of this command. Claude spawns exactly 3 +(Agent A — Known Issues, Agent B — Best Practices, Agent C — Dependencies) via the Agent tool, +waits for all three, then synthesizes their findings into the score table. + +This command makes NO GitHub writes. It ends by printing the approved V2 list and the next-command +pointer. All GitHub writes (issue creation, labeling, linking) belong to E3 `/flow-umbrella`. + +**DELEGATION:** Do not re-implement codebase priming. If the baseline context needs refreshing, +run `/flow-prime` first. + +## Process + +### 1. Read baseline context + +!`ls .flow/ 2>/dev/null || echo "(no .flow/ directory yet)"` + +Determine the initiative description: +- If `$ARGUMENTS` is non-empty → use it. +- Else → read `.flow/state.md` and extract the "Gap" line from the "You are here" section. +- Else → ask the user: "What initiative should I brainstorm? Provide 1–3 sentences." + +Read `.flow/brainstorm-log.md` (if it exists) to determine the current round count. The new +round will be Round N+1 (or Round 1 if the file does not exist yet). + +!`test -f .flow/brainstorm-log.md && grep -c "^## Round" .flow/brainstorm-log.md || echo "0"` + +### 2. Produce V1 — naive plan (UNSCORED) + +Generate a flat bullet list of 5–10 candidate items **from baseline knowledge only** — no research +yet. Every item must be: + +- **Unscored** — no dimension scores; plain text only. +- **Labeled "V1"** — the section heading must read `## V1 — Naive Plan (N items, unscored)`. +- **Descriptive** — format: `- <item title>: <one-sentence description of what and why>`. + +Coverage heuristics: include obvious high-value items, known technical debt, upstreams that may +be blocked, and at least one item that is likely out of scope (to stress-test the critique gate). + +### 3. Critique gate — tag V1 items (do NOT fix them) + +For each V1 item, attach zero or more flags. Flags are labels only — do not change V1 text. + +| Flag | When to apply | +|------|---------------| +| `assumption` | Relies on a fact not verified against the codebase or docs | +| `scope-creep` | Touches E3/E4/E5 behavior or an out-of-scope system | +| `no-evidence` | No concrete codebase grounding for the stated need | + +Present as: `- <item title> [assumption, scope-creep]` or `- <item title> [none]`. + +The flags guide the research agents. An `assumption`-flagged item means "Agent A should verify +this claim." A `scope-creep` flag means "Agent B should confirm boundaries." + +### 4. Spawn 3 read-only research subagents in parallel + +Invoke the **Agent tool** to spawn all three concurrently. Each subagent is read-only — it MUST +NOT write files or make GitHub writes. Pass the V1 items + critique flags in the prompt. + +**Agent A — Known Issues** + +Prompt: +``` +You are a read-only research agent. You MUST NOT write files or make GitHub writes. + +Initiative: <initiative-description> +V1 items (with critique flags): <paste V1 list with flags> + +Task: Read the open GitHub issues, recent git log, and .flow/state.md. +Report: + 1. Which V1 items are blocked by or related to open issues? (cite #N) + 2. Which V1 items are partially done (recent branches/PRs touching them)? + 3. Which V1 `assumption` flags are contradicted by known incidents or bugs? + +Output: concise bullet list, #N refs where applicable. Read-only. +``` + +**Agent B — Best Practices** + +Prompt: +``` +You are a read-only research agent. You MUST NOT write files or make GitHub writes. + +Initiative: <initiative-description> +V1 items (with critique flags): <paste V1 list with flags> + +Task: Read CLAUDE.md, AGENTS.md, docs/flow-pack-methodology.md, and .claude/rules/. +Report: + 1. Which V1 items align with or contradict current best practices? + 2. Which V1 items are already covered by an existing skill or command? (reuse opportunity) + 3. Which V1 `scope-creep` flags are confirmed — item truly belongs to E3/E4/E5? + +Output: concise bullet list. Read-only. +``` + +**Agent C — Dependencies** + +Prompt: +``` +You are a read-only research agent. You MUST NOT write files or make GitHub writes. + +Initiative: <initiative-description> +V1 items (with critique flags): <paste V1 list with flags> + +Task: Read pyproject.toml, frontend/package.json, docker-compose.yml, +and docs/_base/API_CONTRACTS.md. +Report: + 1. Which V1 items have unresolved upstream dependencies or API blockers? + 2. Which V1 `no-evidence` flags are confirmed — no codebase grounding found? + 3. Any dependency pinning or version conflicts that affect V1 items? + +Output: concise bullet list. Read-only. +``` + +Wait for all three agents before proceeding. + +### 5. Score V1 items on 5 dimensions + +Use agent findings as evidence for the Evidence dimension. Score each item 1–10 per dimension: + +| Dimension | 1 = low | 10 = high | Evidence dimension note | +|-----------|---------|-----------|------------------------| +| **Value** | Cosmetic / irrelevant | Core user outcome | — | +| **Risk** | Low risk, well-understood | High risk, many unknowns | Higher Risk = lower desirability | +| **Readiness** | Many blockers open | All upstreams clear | Blocked = lower score | +| **Complexity** | Trivial | Enormous effort | Higher Complexity = lower desirability | +| **Evidence** | Pure assumption | Fully verified by agents | Directly from agent reports | + +Note: Risk and Complexity score INVERSELY — a low-risk, low-complexity item scores 9–10, not 1–2. +(A high-risk item is less desirable, so it scores lower on the Risk dimension.) + +Present the score table: + +``` +| Item | Value | Risk | Readiness | Complexity | Evidence | Total | Band | +|------|-------|------|-----------|------------|----------|-------|------| +| ... | 8 | 7 | 9 | 6 | 9 | 39 | 🟡 NEGOTIATE | +``` + +Band indicators: +- `✅ SHIP` — total ≥ 40 +- `🟡 NEGOTIATE` — total 36–39 (requires human decision before V2) +- `❌ DEFER` — total < 36 (requires explicit one-clause written reason) + +### 6. Handle negotiation zone (36–39 items) + +If any items score 36–39, **STOP and surface to human** before constructing V2: + +``` +N item(s) are in the negotiation zone (score 36–39): + + - <item>: score 38. Rationale: <one sentence from agent reports>. + Research note: Agent B flagged this as covered by an existing skill (reuse potential). + +Decision needed for each item — respond 'ship', 'defer', or 'defer: <reason>': +``` + +Wait for human response for each negotiate item. Record the decision in the round log. + +If all items are SHIP or DEFER, skip this step. + +### 7. Produce V2 — ship list and defer list + +**V2 ship list** (items scoring ≥ 40, plus negotiate items the human shipped): + +``` +## V2 — Ship List + +1. <item title> (score: X/50): <one-sentence rationale drawing on agent evidence> +2. ... +``` + +**Defer list** (items scoring < 36, plus negotiate items the human deferred): + +``` +## Defer List + +- <item title> (score: X/50): DEFER — <explicit one-clause reason> +``` + +Every defer item MUST have an explicit reason. "DEFER — not needed now" is not acceptable. +Good example: "DEFER — overlaps the existing `analyzing-ai-repos` skill; fold into /flow-prime +if deep external analysis is needed." + +**One-pass confidence score** on the V2 ship list: + +``` +One-pass confidence: X/10 — <one sentence: what gives confidence and what remains uncertain> +``` + +### 8. Append to `.flow/brainstorm-log.md` + +Update rules: +- **File absent** → create with provenance header + `# /flow-brainstorm — decision log` + first round section. +- **File exists** → count existing `## Round` headings, append `## Round (N+1) — <date>`. +- **NEVER overwrite previous rounds.** The log is append-only. + +Provenance header (write only on creation): +``` +<!-- provenance: /flow-brainstorm decision trail. Append-only. NOT committed. --> +# /flow-brainstorm — decision log +``` + +Round section format (exact fields — one paragraph per field, bold label): + +```markdown +## Round N — YYYY-MM-DD + +**Initiative:** <initiative description> +**V1 (N items, unscored):** (1) <item1> (2) <item2> ... +**Critique flags:** <"item title [flags]" for flagged items, or "none"> +**Research:** spawned 3 read-only subagents (A Known Issues, B Best Practices, C Dependencies) +**Agent findings (evidence-backed):** +- A: <key findings, one line> +- B: <key findings, one line> +- C: <key findings, one line> +**5-dim scores (Value/Risk/Readiness/Complexity/Evidence, ≥40 ship):** +- <item title> V/R/Re/C/E=total ✅ SHIP / 🟡 NEGOTIATE → <decision> / ❌ DEFER +**V2 SHIP:** <item1>, <item2>, ... **DEFER:** <item> — <reason>; ... +**One-pass confidence:** X/10 — <rationale> +**User response:** <what the human decided at the approval gate> +``` + +### 9. Human approval gate + +Print V2 ship list and defer list in full. Print the one-pass confidence score. + +``` +──────────────────────────────────────────── + Approve V2 ship list? + 'approve' → write log entry + print next-command pointer + 'revise: <instruction>' → adjust scores or categorizations +──────────────────────────────────────────── +``` + +After human approves, write the log entry (Step 8) with `User response: approved`. + +### 10. Gate result and next-command + +Print using the Output Format below. + +## Output Format + +``` +━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ + 💡 flow-brainstorm: V1 → Score → V2 +━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ + +📋 Baseline Context + Initiative: <description> + Source: [.flow/state.md gap | $ARGUMENTS] + Brainstorm round: N (log entry Round N appended) + +📋 V1 — Naive Plan (N items, unscored) + 1. <item title>: <one-sentence description> [flags or none] + 2. ... + +📋 Research (3 agents — parallel) + Agent A (Known Issues): <2-line summary> + Agent B (Best Practices): <2-line summary> + Agent C (Dependencies): <2-line summary> + +📋 Scoring + | Item | V | R | Re | C | E | Total | Band | + |------|----|----|----|----|----|-------|------| + ... + +📋 V2 — Approved List + Ship (N items): <item1>, <item2>, ... + Defer (M items): <item> — <reason>; ... + One-pass confidence: X/10 + +──────────────────────────────────────────── + ✅ V2 APPROVED → .flow/brainstorm-log.md updated (Round N) +──────────────────────────────────────────── + +→ Next: /flow-umbrella <initiative> +``` + +## Arguments + +`$ARGUMENTS` — the initiative description, passed as free text +(e.g., `/flow-brainstorm add batch forecasting to the system`). +If omitted, the command falls back to `.flow/state.md` Gap line; if state.md is absent, +asks the user directly. Passed through to the gate result and the next-command pointer. +``` + +--- + +### Task 2: Create `.claude/commands/flow/flow-brainstorm.md` (local runtime install) + +After writing Task 1, create the local install as a byte-copy: + +```bash +cp docs/flow-pack/commands/flow-brainstorm.md .claude/commands/flow/flow-brainstorm.md +``` + +Verify no drift: + +```bash +diff docs/flow-pack/commands/flow-brainstorm.md .claude/commands/flow/flow-brainstorm.md \ + && echo "OK — identical" || echo "DRIFT DETECTED — fix before proceeding" +``` + +The local install MUST NOT be committed (`.claude/` is gitignored). Its only purpose is to +make `/flow:flow-brainstorm` available in Claude Code for the current working session. + +--- + +### Task 3: Commit the tracked template only + +Stage ONLY the tracked template: + +```bash +# Verify issue #371 is open before committing +gh issue view 371 --json state --jq '.state' # must return "OPEN" + +# Stage ONLY the tracked template +git add docs/flow-pack/commands/flow-brainstorm.md + +# Verify staged files — must show only the tracked template +git diff --cached --name-only + +# Commit +git commit -m "docs(repo): add /flow-brainstorm command — E2 of flow-pack suite (#371)" +``` + +**Do NOT stage:** +- `.claude/commands/flow/flow-brainstorm.md` (gitignored — correct that `git add` ignores it) +- `uv.lock` (pre-existing modification unrelated to this PRP) +- `docker-compose.lan.yml` (local-only untracked file) + +--- + +## Validation Loop + +### Level 1: File existence and byte-identity + +```bash +# Both files must exist +test -f docs/flow-pack/commands/flow-brainstorm.md && echo "tracked: OK" || echo "tracked: MISSING" +test -f .claude/commands/flow/flow-brainstorm.md && echo "local: OK" || echo "local: MISSING" + +# No drift +diff docs/flow-pack/commands/flow-brainstorm.md .claude/commands/flow/flow-brainstorm.md \ + && echo "identical: OK" || echo "DRIFT: fix with cp" +``` + +### Level 2: Content completeness + +```bash +F=docs/flow-pack/commands/flow-brainstorm.md + +# Frontmatter +head -3 "$F" | grep -q "description:" && echo "frontmatter: OK" || echo "frontmatter: MISSING" + +# Provenance comment — must match flow-prime pattern +grep -q "Source of truth: docs/flow-pack/commands/flow-brainstorm.md" "$F" && echo "provenance: OK" + +# All 5 scoring dimensions +for dim in Value Risk Readiness Complexity Evidence; do + grep -q "\*\*${dim}\*\*\|${dim} |${dim}:" "$F" && echo "dim-${dim}: OK" || echo "dim-${dim}: MISSING" +done + +# Score bands +grep -q "≥ 40\|>= 40\|≥40" "$F" && echo "band-40: OK" || echo "band-40: MISSING" +grep -q "36–39\|36-39" "$F" && echo "band-36-39: OK" || echo "band-36-39: MISSING" +grep -q "< 36\|<36" "$F" && echo "band-36: OK" || echo "band-36: MISSING" + +# Exactly 3 named agents +grep -c "Agent [ABC]" "$F" | xargs -I{} sh -c '[ {} -ge 3 ] && echo "3-agents: OK" || echo "3-agents: MISSING"' + +# Append-only rule for brainstorm-log +grep -q "append\|Append-only" "$F" && echo "append-rule: OK" || echo "append-rule: MISSING" + +# Next-command pointer +grep -q "flow-umbrella" "$F" && echo "next-cmd: OK" || echo "next-cmd: MISSING" + +# $ARGUMENTS section +grep -q "ARGUMENTS\|\$ARGUMENTS" "$F" && echo "args: OK" || echo "args: MISSING" +``` + +### Level 3: Commit integrity + +```bash +# Commit message format +git log -1 --format='%s' | grep -E "^docs\(repo\): add /flow-brainstorm command" \ + && echo "commit-msg: OK" || echo "commit-msg: WRONG" + +# Issue reference +git log -1 --format='%s' | grep -q "#371" && echo "issue-ref: OK" || echo "issue-ref: MISSING" + +# .claude/ not committed +git show --name-only HEAD | grep ".claude/" \ + && echo "ERROR: .claude/ committed — must not be" || echo ".claude/-clean: OK" + +# uv.lock not committed +git show --name-only HEAD | grep "uv.lock" \ + && echo "ERROR: uv.lock committed — unstage it" || echo "uv.lock-clean: OK" + +# Only the tracked template committed +git show --name-only HEAD | grep -v "^commit\|^Author\|^Date\|^$\|^ " \ + | grep -v "^docs/flow-pack/commands/flow-brainstorm.md$" \ + && echo "UNEXPECTED FILES in commit" || echo "commit-scope: OK" +``` + +### Level 4: Functional smoke test (manual, after commit) + +In Claude Code, run: +``` +/flow:flow-brainstorm test initiative +``` + +Verify sequentially: +1. ✅ V1 produced: 5–10 items, labeled "V1 — Naive Plan", no dimension scores visible +2. ✅ Critique flags applied: each item annotated with `[assumption|scope-creep|no-evidence|none]` +3. ✅ 3 subagents spawned: Agent A, Agent B, Agent C appear in Agent tool output +4. ✅ Score table produced: all 5 columns (V, R, Re, C, E) + Total + Band +5. ✅ Score bands applied: items labeled SHIP / NEGOTIATE / DEFER +6. ✅ Negotiate gate fires (if any 36–39 items): human asked before V2 constructed +7. ✅ Defer items carry explicit one-clause reasons +8. ✅ `.flow/brainstorm-log.md` appended with new `## Round N` section +9. ✅ Approval gate prints before next-command pointer +10. ✅ Gate result ends with `→ Next: /flow-umbrella <initiative>` + +--- + +## Final Validation Checklist + +- [ ] `docs/flow-pack/commands/flow-brainstorm.md` exists, committed, correct branch +- [ ] `.claude/commands/flow/flow-brainstorm.md` is a byte-copy (`diff` exits 0) +- [ ] Frontmatter `description:` present in tracked template +- [ ] Provenance comment matches flow-prime.md pattern (source-of-truth, recovery cp line) +- [ ] All 5 scoring dimensions named: Value, Risk, Readiness, Complexity, Evidence +- [ ] Score bands documented: ≥ 40 SHIP · 36–39 NEGOTIATE · < 36 DEFER +- [ ] Negotiate zone requires human stop — not auto-shipped +- [ ] Every DEFER item carries explicit one-clause reason (invariant from methodology) +- [ ] Exactly 3 subagents: A (Known Issues), B (Best Practices), C (Dependencies) +- [ ] Subagent prompts are read-only — "MUST NOT write files or make GitHub writes" +- [ ] `.flow/brainstorm-log.md` append rules documented (append-only, round-numbered) +- [ ] Brainstorm log: provenance header only on creation; round section on every run +- [ ] Human approval gate documented before the next-command pointer +- [ ] Next-command pointer: `→ Next: /flow-umbrella <initiative>` +- [ ] `$ARGUMENTS` fallback chain documented (args → state.md gap → ask user) +- [ ] Only `docs/flow-pack/commands/flow-brainstorm.md` committed; `.claude/` never staged +- [ ] `uv.lock` and `docker-compose.lan.yml` NOT staged +- [ ] Commit message: `docs(repo): add /flow-brainstorm command — E2 of flow-pack suite (#371)` +- [ ] `gh issue view 371 --json state --jq '.state'` returns `"OPEN"` before commit +- [ ] Level 1–3 validation scripts all pass + +--- + +## Anti-Patterns to Avoid + +- ❌ Don't commit `.claude/commands/flow/flow-brainstorm.md` — it is gitignored and local-only +- ❌ Don't implement GitHub writes (issue create / label / sub-issue link) — E3/E4 scope +- ❌ Don't auto-ship negotiate-zone items (36–39) — always stop and ask the human +- ❌ Don't write a defer item without a one-clause explicit reason — process failure per invariants +- ❌ Don't spawn 2 agents or 4 agents — exactly 3, always +- ❌ Don't use HTML START/END markers in brainstorm-log.md — that pattern belongs to state.md +- ❌ Don't score V1 items before running the research agents — scoring depends on Evidence +- ❌ Don't stage `uv.lock` or `docker-compose.lan.yml` +- ❌ Don't invent new structural conventions for the command file — mirror flow-prime.md exactly +- ❌ Don't use `!` prefix for Agent tool invocations — `!` is for bash, not for subagent spawning diff --git a/docs/ONBOARDING.md b/docs/ONBOARDING.md new file mode 100644 index 00000000..5b814928 --- /dev/null +++ b/docs/ONBOARDING.md @@ -0,0 +1,146 @@ +# ForecastLabAI — Onboarding Guide + +> Generated from the Understand-Anything knowledge graph (`.understand-anything/knowledge-graph.json`). +> Snapshot: commit `1f7fd82` · 860 files · 2,434 graph nodes · 5,011 edges · 8 layers. +> This is a navigational map, not a substitute for `README.md`, `AGENTS.md`, or `docs/_base/*`. + +--- + +## 1. Project Overview + +**ForecastLabAI** is a portfolio-grade, **single-host retail demand-forecasting system** that exercises the full ML lifecycle and runs end-to-end with one `docker compose up`. It covers: data platform → ingest → time-safe feature engineering → forecasting → backtesting → model registry → RAG → agentic layer → React dashboard. + +| | | +|---|---| +| **Backend** | Python 3.12 · FastAPI · SQLAlchemy 2.0 (async) · Pydantic v2 · Alembic · structlog | +| **Database** | PostgreSQL 16 + **pgvector** (vector store lives in the same container — no separate service) | +| **ML / AI** | pandas · NumPy · scikit-learn · LightGBM/XGBoost (opt-in) · PydanticAI · OpenAI · Anthropic · tiktoken | +| **Frontend** | React 19 · TypeScript · Vite · Tailwind CSS · shadcn/ui · TanStack Query/Table · React Router · Recharts | +| **Tooling** | uv (Python) · pnpm (JS) · Docker/Compose · GitHub Actions + release-please · ruff · mypy/pyright `--strict` · pytest/Vitest | + +**Defining traits to internalize early:** +- **Vertical-slice architecture** — every domain lives under `app/features/<slice>/{models,schemas,service,routes,tests}.py`. A slice may **not** import another slice; cross-cutting code goes through `app/core/` or `app/shared/`. +- **Time-safety is the load-bearing invariant** — feature engineering must never read past the caller's `cutoff_date`. `app/features/featuresets/tests/test_leakage.py` is the executable spec; it must never be weakened. +- **Single-host by design** — no managed-cloud SDK in the core path; `docker compose up` is the only prerequisite besides Python + Node. +- **Docs-first** — work flows `INITIAL-*.md` → `PRPs/PRP-*.md` → vertical-slice implementation → CI gates. + +--- + +## 2. Architecture Layers + +The graph groups all 864 file-level nodes into **8 layers**: + +| Layer | Files | What lives here | +|-------|------:|-----------------| +| **Backend Core & Infrastructure** | 67 | `app/core/*` (config, db engine, logging, middleware, problem-details, health) + `app/shared/*` (cross-slice ORM, seeder "The Forge"). The cross-cutting foundation every slice depends on. | +| **Backend Feature Slices** | 261 | The 17+ vertical domain slices under `app/features/` (forecasting, agents, registry, rag, scenarios, backtesting, analytics, batch, demo, …), each self-contained. | +| **Data & Migrations** | 23 | Alembic `versions/*` (forward-only migration chain) + SQL example queries that define/evolve the Postgres+pgvector schema. | +| **Frontend (React SPA)** | 240 | `frontend/src/` — pages, shadcn/ui components, TanStack Query hooks, API client/lib, `types/api.ts`, build config. | +| **Documentation & PRPs** | 202 | `docs/`, ADRs, phase guides, and the `PRPs/` / `INITIAL-*` requirement plans that gate every slice. | +| **CI/CD & Containerization** | 26 | `.github/workflows/*`, Dockerfiles, `docker-compose*.yml`, devcontainer. | +| **Scripts & Demos** | 36 | CLI utilities + demo drivers (`scripts/`, `examples/`) outside the app/frontend trees. | +| **Project Configuration** | 9 | Root tooling/env config (`pyproject.toml`, lockfiles, pre-commit, release-please, `.env.example`). | + +--- + +## 3. Key Concepts & Patterns + +- **The vertical slice (read one to learn all).** `models.py` (SQLAlchemy ORM) → `schemas.py` (Pydantic v2 boundary) → `service.py` (business logic) → `routes.py` (HTTP) → `tests/`. The **registry** slice is the cleanest exemplar. +- **RFC 7807 errors everywhere.** All errors return `application/problem+json` via `app/core/problem_details.py` / `app/core/exceptions.py` — never a bare 500, never an ad-hoc error shape. +- **Config through one door.** Feature code reads `app/core/config.get_settings()` (cached singleton) — never `os.environ` directly. Use `pathlib.Path`, never `os.path`. +- **Async ORM.** `app/core/database.py` owns the async engine, session-maker, `get_db` dependency, and declarative `Base`. Every model inherits `Base`; every service opens a session through `get_db`. +- **Time-safe features.** Lags via `shift(k)`, rolling via `shift(1).rolling(...)`, entity-aware `groupby` — enforced by `test_leakage.py`. +- **Forward-only migrations.** Once an Alembic migration merges, never edit it — add a new one. CI's `migration-check` replays the chain on a fresh DB every PR. +- **HITL agent gate.** Mutating PydanticAI tools (`create_alias`, `archive_run`, `save_scenario`) block on human approval via `agent_require_approval`. Never widen the agent's mutation surface without adding the tool there. +- **Registry trust model.** A run moves `pending → running → success/failed → archived`; an alias may point only to a `success` run; artifacts are SHA-256-verified with path-traversal prevention. + +--- + +## 4. Guided Tour (recommended reading order) + +A 14-step path from entry point to single-host deploy. Each step names the files to open. + +1. **Project Overview** — `README.md` + `AGENTS.md`. The roadmap, stack, validation gates, and vertical-slice brief every later step assumes. +2. **Application Entry Point** — `app/main.py`. FastAPI factory: wires every slice's router, CORS, request-ID middleware, RFC 7807 handlers, lifespan. The bird's-eye map of the backend surface. +3. **Core: Config, DB, Errors** — `app/core/config.py` (cached `get_settings()`), `app/core/database.py` (async engine, `get_db`, `Base`), `app/core/problem_details.py`. Highest-fan-in backend files — breakage cascades. +4. **The Data Platform (Domain Model)** — `app/features/data_platform/models.py`. The 7-table retail core (store/product/calendar/sales_daily/price_history/promotion/inventory) + Phase-2 tables. The vocabulary the whole system speaks; grain = one `sales_daily` row per store × product × date. +5. **Time-Safe Feature Engineering** — `app/features/featuresets/service.py` + `tests/test_leakage.py`. Leakage-prevented lag/rolling/calendar/exogenous/lifecycle features; the test is the spec. +6. **Forecasting & Backtesting** — `forecasting/service.py` + `models.py` (model zoo), `backtesting/splitter.py` (expanding/sliding folds), `backtesting/metrics.py` (MAE/sMAPE/WAPE/bias/RMSE + per-bucket). +7. **A Slice End-to-End: the Model Registry** — `registry/{models,schemas,service,routes}.py` + `storage.py`. Run state machine, comparable-run/feature-frame invariants, aliases, SHA-256 artifact integrity. +8. **Database Migrations** — `alembic/versions/`. Forward-only chain applied via `alembic upgrade head` at container start; CI replays it every PR. +9. **RAG Knowledge Base (pgvector)** — `rag/service.py`. Idempotent (content-hash) indexing + HNSW retrieval inside the same Postgres container; embedding dim is fixed per provider. +10. **The Agentic Layer with HITL** — `agents/service.py`, `deps.py`, `websocket.py`. PydanticAI sessions, streaming, and the human-in-the-loop approval gate for mutating tools. +11. **Frontend Contract & Data Layer** — `frontend/src/types/api.ts` (mirrors backend schemas; most-depended-on file in the repo), `lib/api.ts` (fetch + RFC 7807 → typed `ApiError`), `hooks/use-demo-pipeline.ts`. +12. **A Key Page: the Showcase** — `frontend/src/pages/showcase.tsx` (drives the live demo pipeline in-browser) + `knowledge.tsx` (RAG corpus + semantic search). +13. **The End-to-End Demo Pipeline** — `app/features/demo/pipeline.py`. Capstone: seed → features → train → backtest → register → alias → RAG → agent in-process; mirrors `scripts/run_demo.py`. +14. **Containerization, CI, Config** — `docker-compose.yml`, `Dockerfile.backend`, `.github/workflows/ci.yml` (4 blocking gates), `pyproject.toml`. + +--- + +## 5. File Map — the highest-leverage files + +**Most-depended-on (fan-in) — change these carefully:** + +| File | Importers | Role | +|------|----------:|------| +| `frontend/src/types/api.ts` | 116 | Single source of truth for backend schema types | +| `app/core/config.py` | 68 | Cached settings singleton | +| `app/core/database.py` | 51 | Async engine / session / `Base` | +| `frontend/src/components/ui/button.tsx` | 47 | shadcn primitive | +| `frontend/src/components/ui/card.tsx` | 46 | shadcn primitive | +| `frontend/src/lib/utils.ts` | 44 | FE utility helpers (`cn`, etc.) | +| `app/features/data_platform/models.py` | 43 | De-facto shared ORM layer (all fact-table FKs) | +| `frontend/src/lib/api.ts` | 42 | Fetch wrapper + RFC 7807 parsing | +| `app/core/logging.py` | 41 | structlog setup | +| `app/features/forecasting/schemas.py` | 34 | Forecast train/predict contracts | +| `app/shared/seeder/config.py` | 33 | Seeder scenario presets ("The Forge") | +| `app/main.py` | 28 | Router/middleware wiring hub | + +**By layer (entry points to start reading):** +- **Backend Core** → `app/main.py`, `app/core/{config,database,problem_details,exceptions,logging,middleware,health}.py` +- **Feature Slices** → pick one and read M→S→S→R→T; `registry/` is the model slice, `forecasting/` and `agents/` are the richest +- **Data & Migrations** → `alembic/versions/*` (newest = current schema), `examples/*.sql` +- **Frontend** → `types/api.ts` → `lib/api.ts` → `hooks/*` → `pages/showcase.tsx` +- **Scripts & Demos** → `scripts/run_demo.py`, `scripts/seed_random.py` + +--- + +## 6. Complexity Hotspots — approach carefully + +Files the analyzer rated **complex**. Concentrated in **batch**, **forecasting**, **analytics**, **agents/tools**, and **backtesting**: + +- **Batch slice** — `batch/runner.py` (bounded-concurrency async runner w/ cancel/drain), `batch/service.py`, `batch/models.py`, `batch/tests/test_runner.py`. Concurrency + cancellation semantics; read the tests alongside the code. +- **Forecasting** — `forecasting/models.py` (baseline→regression→LightGBM/XGBoost/prophet-like factory), `forecasting/service.py` (leakage-safe regression matrices), `forecasting/schemas.py` (config union), plus its test suite (`test_service`, `test_models`, `test_feature_metadata`, `test_persistence`, `test_schemas`). +- **Analytics** — `analytics/{service,routes,schemas}.py` + integration tests (SQL GROUP-BY aggregation; date-range validation). +- **Agents / tools** — `agents/tools/registry_tools.py` & `backtesting_tools.py` (HITL-gated mutations), `agents/tests/test_tools.py`. +- **Backtesting** — `backtesting/metrics.py` (metric math + per-bucket aggregation), `backtesting/schemas.py`. +- **Core** — `app/core/exceptions.py` (domain exception hierarchy → RFC 7807 handlers). + +> Gotcha worth flagging: Pydantic v2 **strict mode** on request bodies 422s ISO-string values for `date`/`datetime`/`UUID`/`Decimal` unless the field has `Field(strict=False, ...)` — see `forecasting/tests/test_schemas.py` and `app/core/tests/test_strict_mode_policy.py`. + +--- + +## 7. Getting Started (validation gates) + +```bash +cp .env.example .env # set OPENAI_API_KEY / ANTHROPIC_API_KEY +docker compose up -d # Postgres+pgvector on :5433 +uv sync --extra dev # backend deps (Python 3.12) +uv run alembic upgrade head # migrations +uv run uvicorn app.main:app --reload --port 8123 +cd frontend && corepack enable pnpm && pnpm install && pnpm dev # UI on :5173 +``` + +Run before every commit (all five gate merge in CI): + +```bash +uv run ruff check . && uv run ruff format --check . +uv run mypy app/ && uv run pyright app/ # both --strict +uv run pytest -v -m "not integration" +``` + +`make demo` runs the full end-to-end pipeline; the **Showcase** page (`/showcase`) drives it live in-browser. + +--- + +*Explore interactively:* `/understand-anything:understand-dashboard` · *Ask questions:* `/understand-anything:understand-chat` · *Deep-dive a file:* `/understand-anything:understand-explain`. diff --git a/docs/_base/ARCHITECTURE.md b/docs/_base/ARCHITECTURE.md index 2c087069..01804385 100644 --- a/docs/_base/ARCHITECTURE.md +++ b/docs/_base/ARCHITECTURE.md @@ -7,7 +7,7 @@ ### What This Repo Owns - The entire stack: FastAPI backend (`app/`), React 19 SPA (`frontend/`), Alembic migrations (`alembic/`), data seeder (`app/shared/seeder/` + `scripts/seed_random.py`), `.claude/` policy + skills + hooks, docs (`docs/`, `PRPs/` incl. `PRPs/INITIAL/`). - 7-table retail data platform (`store`, `product`, `calendar`, `sales_daily`, `price_history`, `promotion`, `inventory_snapshot_daily`) + registry, jobs, RAG sources/chunks, agent sessions. -- 11 backend vertical slices under `app/features/` + cross-cutting `app/core/` + `app/shared/`. +- 19 backend vertical slices under `app/features/` + cross-cutting `app/core/` + `app/shared/`. ### What This Repo Depends On | Dependency | Interface | Owner | Change Process | @@ -34,7 +34,7 @@ ForecastLabAI repo ├── app/ # FastAPI process (uvicorn :8123) │ ├── core/ # config, db engine, logging, middleware, problem-details, health │ ├── shared/ # cross-slice models + seeder ("The Forge") -│ └── features/<slice>/ # vertical slices (11 of them) +│ └── features/<slice>/ # vertical slices (19 of them) └── frontend/ # Vite dev server :5173 (proxies → :8123) ``` diff --git a/docs/_base/REPO_MAP_INDEX.md b/docs/_base/REPO_MAP_INDEX.md index 5300f157..f43f3440 100644 --- a/docs/_base/REPO_MAP_INDEX.md +++ b/docs/_base/REPO_MAP_INDEX.md @@ -6,7 +6,7 @@ ## System at a Glance -ForecastLabAI is a portfolio-grade, single-host retail-demand-forecasting system. One developer maintains it; one `docker-compose up` brings it up. The backend is FastAPI + SQLAlchemy 2.0 async against PostgreSQL 16 + pgvector; the frontend is React 19 + Vite + Tailwind 4 + shadcn/ui. Eleven vertical slices under `app/features/` cover the full lifecycle (data platform → ingest → features → forecasting → backtesting → registry → RAG → agents → dashboard surfaces). Pre-1.0; release-please drives SemVer; merges flow `dev` → `main`. +ForecastLabAI is a portfolio-grade, single-host retail-demand-forecasting system. One developer maintains it; one `docker-compose up` brings it up. The backend is FastAPI + SQLAlchemy 2.0 async against PostgreSQL 16 + pgvector; the frontend is React 19 + Vite + Tailwind 4 + shadcn/ui. Nineteen vertical slices under `app/features/` cover the full lifecycle (data platform → ingest → features → forecasting → backtesting → registry → RAG → agents → dashboard surfaces). Pre-1.0; release-please drives SemVer; merges flow `dev` → `main`. ## Document Index diff --git a/docs/_repoKB-deepdive/deepdive-ai-engineer.md b/docs/_repoKB-deepdive/deepdive-ai-engineer.md new file mode 100644 index 00000000..4c8ef8b0 --- /dev/null +++ b/docs/_repoKB-deepdive/deepdive-ai-engineer.md @@ -0,0 +1,256 @@ +# Deep Dive: AI Engineer + +## Scope + +This document studies ForecastLabAI through the AI engineer lens: forecasting mechanics, feature safety, model execution, registry semantics, scenario logic, RAG pipeline, agent orchestration, and AI risk controls. + +## 1. Research + +### AI surface area in the repo + +ForecastLabAI has four distinct AI/ML layers: + +1. classical forecasting and feature-engineered prediction +2. evaluation and model selection +3. retrieval-augmented generation +4. tool-using agents with approval-gated mutations + +That separation is important. This is not one generic "AI layer." It is several different reasoning and execution systems sharing one product. + +### Forecasting system + +`app/features/forecasting/service.py` is the central forecasting orchestrator. It handles: + +- training data loading +- model instantiation +- feature-frame handling +- artifact persistence +- prediction +- feature metadata extraction + +The service explicitly documents time-safety and contains version-aware feature-frame logic. That is the right shape for a forecasting platform where leakage is a core product risk. + +### Feature safety model + +The repo treats time-safety as a first-class invariant: + +- lagged features +- shifted rolling windows +- feature-frame contracts in shared code +- leakage tests treated as specification + +This is stronger than many ML repos that mention leakage conceptually but do not operationalize it. Here, time-safety is both an implementation rule and a testing rule. + +### Model inventory and execution + +The forecasting layer supports a mix of: + +- naive and seasonal baselines +- moving average variants +- regression-style feature-aware models +- optional heavier models such as LightGBM and XGBoost behind flags + +This is a sensible AI-engineering tradeoff: + +- cheap baselines for control and comparability +- feature-aware models for richer behavior +- optional advanced learners so the core install stays light + +### Backtesting and model selection + +The repo goes beyond training into disciplined evaluation: + +- time-series CV +- fold metrics +- horizon-bucket metrics +- candidate ranking +- champion workflows +- promotion to aliases + +This matters because the AI story is not "we can forecast," but "we can evaluate, compare, and operationalize forecasts." + +### Scenario simulation + +`app/features/scenarios/service.py` introduces two different planning methods: + +1. heuristic post-forecast adjustment +2. `model_exogenous` re-forecasting for feature-aware models + +That distinction is product-important and technically honest. The code explicitly labels which path is heuristic and which path is model-driven. + +### Registry and artifact lifecycle + +Model runs live in registry tables while artifacts live on disk. The registry tracks: + +- configs +- metrics +- runtime info +- feature frame version and metadata +- aliasing and compare flows + +This is the core reproducibility seam of the ML system. Without it, scenario planning, explainability, and promotion would be much weaker. + +### RAG pipeline + +`app/features/rag/service.py` shows a standard but careful retrieval pipeline: + +- source ingest from content or path +- content hashing for idempotency +- path traversal protection +- chunking by source type +- embedding generation +- pgvector storage +- semantic retrieval with thresholds + +The local/provider-switchable design is especially practical for this repo's single-host identity. + +### Agent system + +`app/features/agents/service.py` orchestrates: + +- session lifecycle +- agent selection +- streaming +- tool execution +- approval state +- token and tool-call accounting + +The service intentionally forces sequential tool execution because all tools share one DB session and concurrent use would violate SQLAlchemy session constraints. That is a good example of AI engineering being informed by infrastructure reality. + +### AI provider control + +`app/core/config.py` and the config slice expose runtime control over: + +- agent default and fallback model +- embedding provider +- embedding dimensions +- Ollama host/model +- approval requirements +- session limits and timeouts + +The ability to switch providers live from the product is one of the stronger AI-platform features in the repo. + +## 2. Compose A Role-Based Plan + +### AI engineer reading plan + +Recommended order: + +1. `app/core/config.py` +2. `app/features/featuresets/*` +3. `app/features/forecasting/service.py` +4. `app/features/backtesting/*` +5. `app/features/model_selection/*` +6. `app/features/registry/*` +7. `app/features/scenarios/*` +8. `app/features/rag/service.py` +9. `app/features/agents/service.py` +10. `app/features/agents/tools/*` +11. `frontend/src/pages/knowledge.tsx` +12. `frontend/src/pages/chat.tsx` + +### AI systems review plan + +Review the repo in these four layers: + +1. Predictive ML + - feature safety + - train/predict contract + - artifact compatibility +2. Evaluation and governance + - backtest output shape + - candidate ranking + - alias lifecycle +3. Retrieval + - ingestion safety + - chunking strategy + - embedding/provider constraints +4. Agents + - tool exposure + - approval gate correctness + - session and streaming behavior + +### High-value improvement plan + +An AI engineer would likely prioritize: + +1. clearer model-bundle versioning and compatibility guarantees +2. stronger observability around token use, retrieval quality, and tool outcomes +3. offline evaluation harnesses for retrieval and agent quality +4. explicit latency and cost reporting per provider and workflow +5. tighter provenance reporting across agent answers and scenario-save actions + +## 3. Validate + +### AI engineering strengths + +1. Leakage is treated as a real systems constraint. +2. Model governance is not an afterthought. +3. Scenario simulation is transparent about heuristic versus model-driven logic. +4. RAG indexing is idempotent and includes path safety checks. +5. Agent mutation is approval-gated. +6. Sequential tool execution avoids unsafe session concurrency. + +### AI engineering risks + +1. Single-host execution couples ML latency to API runtime. +2. Artifact compatibility can become subtle as feature-frame versions evolve. +3. Retrieval quality depends on provider/model settings that can change at runtime. +4. Agent reliability depends on tool schemas, provider behavior, and approval UX all staying aligned. +5. There is limited built-in evaluation telemetry for retrieval and agent quality compared with the maturity of the forecasting layer. + +### AI risk controls already present + +- strict Pydantic validation at boundaries +- explicit provider allow-lists +- approval-required mutation tools +- request/session limits +- timeout and retry controls +- content-hash idempotency for RAG +- no direct unsafe execution of model output + +## 4. Generate + +## Generated AI Engineering Findings + +### What kind of AI system this is + +ForecastLabAI is an applied AI product with two very different forms of intelligence: + +1. deterministic statistical/ML forecasting +2. probabilistic LLM-based reasoning and tool use + +The repo handles them separately enough to stay sane, while still exposing them inside one product. + +### Strongest AI design choices + +1. treating time-safety as non-negotiable +2. keeping baseline models in the product +3. preserving registry and artifact provenance +4. making scenario methods explicit +5. showing the RAG corpus directly in the UI +6. forcing human approval for mutating agent actions + +### Where the AI system is most mature + +The predictive ML and governance story feels the most mature: + +- forecasting +- backtesting +- registry +- model selection +- scenario compatibility awareness + +The RAG and agent layers are credible and well integrated, but they still have more room for evaluation and observability depth than the classical forecasting side. + +### Recommended AI engineering priorities + +1. Add retrieval-quality and agent-quality evaluation fixtures. +2. Surface token, provider, retrieval, and tool-call metrics more explicitly. +3. Formalize artifact and feature-frame compatibility in one canonical contract. +4. Keep expanding provenance in agent-visible and user-visible outputs. +5. Protect the distinction between deterministic model workflows and LLM-generated reasoning; that clarity is a strength. + +### Final AI engineer view + +This is a serious applied AI repository because it does not collapse every intelligence problem into "call an LLM." It uses the right tool for each layer: statistical models for forecasting, retrieval for grounded context, and agents for guided orchestration. The next maturity step is better evaluation and observability around the LLM-powered layers, not more raw capability. diff --git a/docs/_repoKB-deepdive/deepdive-product-manager.md b/docs/_repoKB-deepdive/deepdive-product-manager.md new file mode 100644 index 00000000..ec71810a --- /dev/null +++ b/docs/_repoKB-deepdive/deepdive-product-manager.md @@ -0,0 +1,230 @@ +# Deep Dive: Product Manager + +## Scope + +This document studies ForecastLabAI as a product manager would: product surface, audience, workflows, differentiation, maturity, roadmap pressure, and delivery implications. + +## 1. Research + +### Product identity + +ForecastLabAI is a portfolio-grade retail demand forecasting product, not just an ML service and not just a dashboard. The repo combines: + +- synthetic retail data generation +- exploratory analytics +- forecasting workflows +- backtesting +- model governance +- scenario planning +- knowledge retrieval +- chat-driven AI assistance +- an operator-facing UI + +That matters because the product story is end-to-end value, not one isolated feature. + +### Implied target audience + +The product appears aimed at four overlapping audiences: + +1. reviewers hiring for platform, ML, or applied AI roles +2. technical stakeholders evaluating architecture breadth +3. operators exploring demand, model quality, and forecast actions +4. builders who want a local-first forecasting reference system + +It is not aimed at: + +- multi-tenant enterprise admins +- external consumers via public auth flows +- real-time decisioning users +- non-technical retail end users + +### Main user-facing workflows + +The route map in `frontend/src/App.tsx` exposes the product's real information architecture: + +- `dashboard`: KPI snapshot and top performers +- `showcase`: guided end-to-end live pipeline +- `ops`: operational system state +- `explorer/*`: drill into stores, products, runs, jobs, sales +- `visualize/forecast`: forecast execution +- `visualize/backtest`: model evaluation +- `visualize/demand`: demand planning view +- `visualize/planner`: what-if scenario planning +- `visualize/batch`: batch execution +- `visualize/champion`: champion selection +- `knowledge`: RAG corpus and semantic retrieval +- `chat`: agent interaction +- `guide`: agent education +- `admin`: AI model and provider controls + +This is a mature workflow map for a pre-1.0 product. + +### Core value propositions + +The repo currently offers these strong product claims: + +1. "See the whole forecasting lifecycle in one local product." +2. "Move from raw retail data to model governance and AI assistance." +3. "Inspect not just predictions, but provenance, aliases, backtests, scenario deltas, and knowledge sources." +4. "Switch AI providers and embeddings without restarting the app." +5. "Run a complete live showcase from the browser or CLI." + +### Product differentiation + +The differentiator is not raw forecasting novelty. The differentiator is integration quality: + +- forecasting plus governance +- planning plus agent workflows +- RAG plus live system state +- demoability plus local reproducibility + +Many projects demonstrate one of those. This repo demonstrates their connection. + +### Product maturity signals + +Signals that the product is beyond an internal prototype: + +- dedicated guide and user-guide docs +- multiple specialized visual workflows +- champion-selection and batch-runner surfaces +- scenario library and compare flow +- admin UI for AI provider management +- knowledge page exposing RAG sources and live retrieval +- approval-gated agent actions + +### Product constraints + +The explicit product guardrails are strong: + +- single-host +- no auth/RBAC +- no multi-tenancy +- no streaming architecture +- retail demand forecasting only + +These constraints narrow the addressable market, but sharpen the product identity. + +## 2. Compose A Role-Based Plan + +### Product manager reading plan + +A PM should read the product through these artifacts: + +1. `README.md` +2. `docs/user-guide/*` +3. `frontend/src/App.tsx` +4. `frontend/src/pages/showcase.tsx` +5. `frontend/src/pages/knowledge.tsx` +6. `frontend/src/pages/chat.tsx` +7. `docs/_base/API_CONTRACTS.md` +8. `.claude/rules/product-vision.md` + +### Product analysis plan + +Analyze the repo through these questions: + +1. What problem story is the product telling? +2. Which workflows are most polished today? +3. Which workflows are demonstrably complete, versus technically available but less integrated? +4. Which user journeys require too much prior knowledge? +5. Which capabilities are platform foundations versus presentation layers? + +### Product segmentation plan + +The current product can be segmented into four capability groups: + +1. Retail analytics and exploration +2. Forecasting and model operations +3. Planning and decision support +4. AI-assisted knowledge and action + +That grouping is useful for roadmap and documentation, because the repo now spans more than one narrative. + +### Product roadmap framing + +Near-term roadmap should probably focus on deepening coherence more than widening scope: + +1. tighter cross-linking between pages and workflows +2. stronger in-product explanations of model and scenario outputs +3. better operational visibility for long-running jobs and AI provider state +4. smoother "happy path" narrative for first-time reviewers +5. less conceptual separation between forecast intelligence, governance, knowledge, and agent actions + +## 3. Validate + +### Evidence that the product story is real + +- `showcase.tsx` turns the demo pipeline into a live product experience. +- `knowledge.tsx` exposes both indexed corpus and live system state. +- `chat.tsx` supports session creation, streaming, and approval workflows. +- `use-model-selection.ts` shows a full operator workflow, not just a static page. +- `README.md` and `docs/user-guide/*` describe practical usage, not theoretical future plans. + +### Product strengths + +1. Strong breadth with credible implementation +2. Good local demo story +3. Clear relationship between analytics, forecasting, governance, and AI +4. Frontend route structure reflects actual user jobs to be done +5. Admin/provider management keeps AI from feeling bolted on + +### Product weaknesses + +1. The breadth can dilute first-time comprehension. +2. There are multiple advanced surfaces competing for attention. +3. Some product stories are better documented than they are narratively unified in the UI. +4. No auth means some enterprise-flavored workflows remain intentionally absent. +5. The portfolio identity can mask which capabilities are intended as "hero" features. + +### Product risks + +1. Scope creep beyond the single-host product identity +2. More features without stronger onboarding hierarchy +3. AI features outpacing explanation and trust framing +4. operational complexity becoming visible before operational tooling catches up + +## 4. Generate + +## Generated Product Findings + +### What product this really is + +ForecastLabAI is best understood as a ForecastOps workbench with built-in AI and evidence surfaces. It is not merely a forecasting API and not merely an AI chat wrapper around docs. + +### Strongest product narratives + +The strongest narratives today are: + +1. "End-to-end forecasting platform on one machine" +2. "Forecast plus compare plus promote plus monitor" +3. "What-if planning tied to real runs" +4. "Knowledge-aware assistant with visible corpus and guarded actions" + +### Best current hero experiences + +1. Showcase +2. Champion selector +3. What-if planner +4. Knowledge page +5. Chat plus approval flow + +These are the places where the product demonstrates differentiated value rather than just plumbing. + +### Product opportunities + +1. Stronger first-run narrative across Dashboard -> Showcase -> Explorer -> Planner -> Knowledge -> Chat +2. Better opinionated defaults and guidance around model-choice workflows +3. More surfaced trust signals around scenario quality and AI answer provenance +4. Better role-oriented views for analyst, operator, and reviewer personas + +### Recommended PM priorities + +1. Clarify primary personas and map each page to one. +2. Define two or three canonical demos instead of one broad capability inventory. +3. Tighten in-product copy around decision support and limitations. +4. Make cross-page navigation reinforce the product story. +5. Preserve the local-first identity; it is part of the differentiation. + +### Final PM view + +The product is already rich enough that the next challenge is curation, not raw capability count. The codebase proves that the platform can do a lot. Product management now needs to decide what the user should understand first, second, and third so the strongest value is obvious in a five-minute walkthrough. diff --git a/docs/_repoKB-deepdive/deepdive-software-architect.md b/docs/_repoKB-deepdive/deepdive-software-architect.md new file mode 100644 index 00000000..2930b36b --- /dev/null +++ b/docs/_repoKB-deepdive/deepdive-software-architect.md @@ -0,0 +1,292 @@ +# Deep Dive: Software Architect + +## Scope + +This document studies ForecastLabAI as a systems architect would: platform boundaries, component ownership, runtime topology, persistence model, coupling, scaling limits, and evolution paths. It is grounded in `app/main.py`, `app/core/*`, `app/features/*`, `frontend/src/*`, `docker-compose.yml`, `pyproject.toml`, `frontend/package.json`, `alembic/versions/*`, `Makefile`, and the base docs under `docs/_base/`. + +## 1. Research + +### System identity + +ForecastLabAI is a single-host, end-to-end retail demand forecasting product. The repository intentionally owns the full loop: + +1. Data platform +2. Batch ingest +3. Leakage-safe feature engineering +4. Forecast training and prediction +5. Backtesting +6. Registry and aliases +7. Scenario planning +8. RAG knowledge base +9. Agentic workflows with approval gates +10. React dashboard surfaces + +That identity is enforced both socially and structurally: + +- `app/features/<slice>/` defines vertical slices. +- `docker-compose.yml` keeps runtime local and single-host. +- `.claude/rules/product-vision.md` rejects multi-tenant SaaS, streaming infra, and managed-cloud-first expansion. + +### Architecture style + +The backend is a modular monolith with vertical-slice boundaries. Each slice usually exposes: + +- `models.py` +- `schemas.py` +- `service.py` +- `routes.py` +- `tests/` + +Cross-cutting concerns live in: + +- `app/core/` for config, database, middleware, exceptions, health, problem-details, logging +- `app/shared/` for reusable data-model and feature-frame logic + +The frontend is a route-driven SPA with: + +- page composition in `frontend/src/pages/*` +- reusable domain components in `frontend/src/components/*` +- server-state access in `frontend/src/hooks/*` +- a thin fetch wrapper in `frontend/src/lib/api.ts` + +### Runtime topology + +The production-like local topology is small by design: + +- Postgres 16 + pgvector +- one FastAPI process +- one Vite/React UI +- optional Ollama container for local embeddings + +This yields strong demo portability and a low operational surface, but it also centralizes all CPU-bound training, backtesting, and agent orchestration onto one host. + +### Entrypoints and wiring + +`app/main.py` is the composition root. It wires: + +- middleware +- exception handling +- router registration for all slices +- startup config override replay + +This makes `app/main.py` a high-blast-radius file. Any architectural shift that changes router composition, middleware order, or startup behavior lands here. + +### Persistence model + +The data plane mixes three persistence styles: + +1. relational warehouse-like retail data in `app/features/data_platform/models.py` +2. JSONB-heavy operational metadata in registry, jobs, sessions, scenarios +3. pgvector-backed chunk storage in RAG tables + +That is a pragmatic design for a portfolio system: + +- relational where grain and joins matter +- JSONB where flexibility matters +- vector columns where retrieval matters + +The tradeoff is schema readability: business-critical semantics live partly in migrations and partly in JSON payload conventions. + +### API surface + +The API is broad and coherent. Key groups are: + +- exploratory read APIs: `/dimensions`, `/analytics`, `/ops` +- operational execution APIs: `/forecasting`, `/backtesting`, `/jobs`, `/batch`, `/model-selection` +- model governance APIs: `/registry`, `/config` +- AI APIs: `/rag`, `/agents` +- demo and seeding APIs: `/seeder`, `/demo` +- planning APIs: `/scenarios` + +Error handling is normalized through RFC 7807 problem details, which is a strong contract decision for a system with many slices. + +### Frontend information architecture + +The frontend is organized around user intent rather than backend slices: + +- Dashboard +- Showcase +- Ops +- Explorer +- Visualize +- Knowledge +- Chat +- Guide +- Admin + +This is the right abstraction. The backend is phase-oriented; the UI is workflow-oriented. + +### Testing and governance + +The repository is heavy on validation: + +- `ruff` +- `mypy --strict` +- `pyright --strict` +- unit tests +- integration tests +- migration checks + +Observed footprint from the repo: + +- 328 Python files under `app/` +- 176 backend test files +- 229 frontend TS/TSX files +- 57 frontend test files +- 18 Alembic migrations + +That is a meaningful sign of architecture discipline for a pre-1.0 system. + +## 2. Compose A Role-Based Plan + +### Architectural reading plan + +For an architect onboarding to this codebase, the minimum effective reading order is: + +1. `AGENTS.md` +2. `docs/_base/ARCHITECTURE.md` +3. `docs/_base/API_CONTRACTS.md` +4. `docs/_base/DOMAIN_MODEL.md` +5. `app/main.py` +6. `app/core/config.py` +7. `app/core/database.py` +8. `app/features/data_platform/models.py` +9. `app/features/forecasting/service.py` +10. `app/features/rag/service.py` +11. `app/features/agents/service.py` +12. `frontend/src/App.tsx` + +### Architecture review plan + +Review the system in these lenses: + +1. Boundary integrity + - verify slices depend on `core` and `shared`, not freely on each other + - pay attention to lazy-import seams already used to break cycles +2. Runtime concentration + - identify CPU-heavy paths that still run inline on the API host + - compare jobs, batch, model selection, demo pipeline, and agent activity +3. Data durability + - map what is canonical in tables versus in JSONB versus on disk artifacts +4. Contract stability + - inspect how frontend hooks depend on backend shapes and polling behavior +5. AI safety posture + - inspect where retrieval, tool calling, approval gates, and provider switches can fail + +### Architecture decisions already present + +The codebase has already made these strategic decisions: + +- modular monolith over microservices +- async FastAPI over sync API server +- Postgres as both OLTP-ish store and vector store +- file-based model artifacts instead of external artifact services +- local-first provider switching rather than cloud orchestration +- workflow visibility in-product rather than in external ops tooling + +### Near-term architecture planning topics + +An architect would likely focus next on: + +1. formalizing cross-slice dependency rules with automated checks +2. isolating CPU-heavy training/backtesting from request latency +3. making artifact and JSONB conventions easier to inspect and evolve +4. strengthening app-level observability beyond logs and request IDs +5. reducing hidden coupling between demo orchestration and slice APIs + +## 3. Validate + +### Evidence that the current architecture is coherent + +- The slice map in `app/main.py` matches the product lifecycle. +- `app/core/config.py` centralizes runtime control instead of scattering env reads. +- `app/core/database.py` keeps session creation standardized. +- Multiple services use documented lazy imports to avoid import-cycle collapse. +- `frontend/src/App.tsx` is route-structured and uses lazy loading, which fits the breadth of the UI. +- `docker-compose.yml` keeps the full stack reproducible on one machine. +- `docs/_base/API_CONTRACTS.md` and `docs/_base/DOMAIN_MODEL.md` already track core system invariants. + +### Architectural strengths + +1. Strong vertical-slice organization +2. Clear local deploy story +3. Typed boundaries everywhere +4. Explicit anti-leakage posture in forecasting/featuresets +5. Practical AI safety guardrails with approval-required mutating tools +6. Good UX-to-backend alignment through workflow-based frontend pages + +### Architectural tensions + +1. Single-host simplicity versus CPU-heavy ML workflows +2. Slice purity versus necessary cross-slice orchestration +3. JSONB flexibility versus discoverability and query clarity +4. Broad product scope versus maintainability for one repo and one host +5. Local-first AI flexibility versus provider-specific runtime drift + +### Main risks + +1. `app/main.py` as central blast radius +2. long-running work inside the application process +3. file artifact lifecycle complexity +4. limited observability for concurrency and performance debugging +5. increasing product breadth without a stronger architecture map of ownership and dependency budgets + +## 4. Generate + +## Generated Architectural Findings + +### High-level assessment + +ForecastLabAI is a well-shaped modular monolith. It has enough structure to feel like a real platform, but it still preserves a single-machine demo story. That balance is the repository's main architectural achievement. + +### What the architecture optimizes for + +It optimizes for: + +- demonstrability +- local reproducibility +- architectural breadth +- typed boundaries +- explainable workflows + +It does not optimize for: + +- horizontal scale +- high-throughput asynchronous execution +- multi-tenant isolation +- cloud-native elasticity + +Those are intentional non-goals, not omissions. + +### Primary architectural seams + +The most important seams in the system are: + +1. `core` vs feature slices +2. relational facts/dimensions vs JSONB operational state +3. artifact-on-disk vs metadata-in-registry +4. backend phase APIs vs frontend workflow pages +5. deterministic ML pipeline logic vs probabilistic LLM/agent flows + +### Best-fit mental model + +Treat the repo as four systems sharing one host: + +1. a retail analytics API +2. an ML execution engine +3. an AI retrieval-and-agent layer +4. an operator-facing product shell + +The design works because those systems are colocated but not completely blended. + +### Recommended architectural priorities + +1. Add dependency-graph enforcement for slice boundaries. +2. Make long-running model work more explicitly job-owned and easier to isolate. +3. Introduce richer observability around durations, failures, queue-like backlogs, and artifact usage. +4. Publish a canonical artifact contract covering model bundle versions, registry metadata, and scenario compatibility. +5. Continue treating time-safety, RFC 7807, and approval-gated mutation as non-negotiable architectural invariants. + +### Final architect view + +This repository is already beyond a toy demo. Its value is not just that it has many features, but that those features are connected through consistent contracts. The next architectural challenge is no longer "can it do the whole flow?" but "can the whole flow keep growing without hidden coupling and host saturation?" diff --git a/docs/_repoKB-deepdive/deepdive-software-developer.md b/docs/_repoKB-deepdive/deepdive-software-developer.md new file mode 100644 index 00000000..3fc787e0 --- /dev/null +++ b/docs/_repoKB-deepdive/deepdive-software-developer.md @@ -0,0 +1,281 @@ +# Deep Dive: Software Developer + +## Scope + +This document studies ForecastLabAI as a working developer would: how to navigate it, how the backend and frontend are wired, which files matter first, how to add features safely, and where the sharp edges are. + +## 1. Research + +### First impression of the repo + +ForecastLabAI is not a starter template. It is a broad working application with: + +- a slice-based FastAPI backend +- a route-rich React frontend +- an end-to-end demo path +- ML, RAG, and agent workflows +- real migrations and tests + +The repo is large, but its structure is disciplined enough that a developer can move with confidence if they follow local patterns. + +### High-value entrypoints + +For a developer, these are the first files worth reading: + +- `AGENTS.md` +- `pyproject.toml` +- `frontend/package.json` +- `app/main.py` +- `app/core/config.py` +- `app/core/database.py` +- `frontend/src/App.tsx` +- `frontend/src/lib/api.ts` + +These files define the rules, the stack, the app wiring, and the transport contract. + +### Backend development model + +The backend is organized around vertical slices under `app/features/`. Each slice typically owns: + +- route handlers +- schemas +- business/service logic +- persistence models when needed +- tests + +This is the key implementation rule: do not start by asking "which helper can I invent?" Start by asking "which slice owns this behavior?" + +### Backend slice inventory + +The repo currently contains these major backend feature areas: + +- analytics +- agents +- backtesting +- batch +- config +- data_platform +- demo +- dimensions +- explainability +- featuresets +- forecasting +- ingest +- jobs +- model_selection +- ops +- rag +- registry +- scenarios +- seeder + +### Frontend development model + +The frontend is structured by workflow: + +- pages under `frontend/src/pages/*` +- reusable components under `frontend/src/components/*` +- API/state hooks under `frontend/src/hooks/*` +- transport and utility helpers under `frontend/src/lib/*` + +That means the usual page implementation path is: + +1. page composes the workflow +2. hooks fetch and mutate data +3. components render domain-specific UI +4. helpers format or transform data + +### API access pattern + +`frontend/src/lib/api.ts` is the frontend transport seam. It: + +- builds URLs from `VITE_API_BASE_URL` +- serializes JSON request bodies +- parses JSON and `application/problem+json` +- throws typed `ApiError` failures + +This is important because frontend fixes should generally use this helper instead of raw fetches. + +### React Query pattern + +Hooks such as `frontend/src/hooks/use-model-selection.ts` show the preferred pattern: + +- one hook per query or mutation +- stable `queryKey`s +- mutation success invalidation or cache seeding +- polling only where workflow state requires it + +This keeps page components focused on state transitions and rendering. + +### WebSocket pattern + +`frontend/src/hooks/use-websocket.ts` wraps: + +- connection status +- JSON message parsing +- reconnect logic +- send/disconnect/reconnect helpers + +That avoids scattering WebSocket lifecycle logic through pages like `chat.tsx` and `showcase.tsx`. + +### Testing footprint + +Observed from repo inspection: + +- 328 Python files under `app/` +- 176 backend test files +- 229 frontend TS/TSX files +- 57 frontend test files + +This is a repo where tests are part of the implementation surface. A developer should expect to touch them. + +## 2. Compose A Role-Based Plan + +### Developer onboarding plan + +Recommended order: + +1. read `AGENTS.md` +2. read `README.md` +3. inspect `app/main.py` +4. inspect `frontend/src/App.tsx` +5. pick one backend slice end to end +6. pick one frontend workflow end to end +7. inspect the matching tests + +### Safe change plan + +When implementing anything non-trivial: + +1. find the owner slice or page workflow +2. read route, schema, service, and tests together +3. inspect adjacent frontend code if the change is user-visible +4. reuse existing helper patterns +5. add or update tests before calling the work done + +### Backend change workflow + +For endpoint or service changes: + +1. inspect `routes.py` +2. inspect `schemas.py` +3. inspect `service.py` +4. inspect relevant `models.py` +5. inspect `tests/` +6. patch the narrowest owning surface + +### Frontend change workflow + +For UI or workflow changes: + +1. inspect page component +2. inspect relevant hook +3. inspect domain component +4. inspect utility/helper module if any +5. inspect matching tests +6. patch the narrowest surface + +### When to use `core` or `shared` + +Move code to `app/core/` only when it is truly cross-cutting platform behavior: + +- config +- logging +- database/session +- middleware +- error handling + +Move code to `app/shared/` when multiple slices need the same pure or semi-pure logic. Forecast feature-frame logic is a good example of this pattern. + +### Developer risk map + +Handle these areas carefully: + +- `app/main.py` +- `app/core/database.py` +- `app/core/problem_details.py` +- `app/features/featuresets/tests/test_leakage.py` +- `alembic/versions/*` + +These are high-blast-radius files or rules. + +## 3. Validate + +### Evidence that the repo is developer-friendly + +- Commands are clearly documented. +- Stack configuration is centralized. +- Slice structure is consistent. +- There are many examples of the preferred patterns. +- Frontend transport is standardized. +- WebSocket behavior is abstracted. +- Quality gates are explicit and strict. + +### Backend sharp edges + +1. Import cycles between slices can happen; some services already use lazy imports to avoid them. +2. Long-running work may be triggered from API-managed workflows. +3. Artifact and run compatibility rules span multiple slices. +4. Time-safety requirements make "small" ML changes riskier than they first appear. + +### Frontend sharp edges + +1. Polling workflows can hide backend state assumptions. +2. Route-level UX often depends on specific backend response fields. +3. WebSocket flows need careful streaming and terminal-state handling. +4. Multiple advanced pages can share subtle utility logic. + +### Practical verification habits + +For backend: + +- run relevant slice tests first +- run integration tests if schema or DB behavior changed +- verify error paths still return RFC 7807 responses + +For frontend: + +- run the nearest component or utility tests first +- verify page behavior against the real endpoint contract +- check loading, empty, success, and error states + +For cross-stack: + +- verify request/response field names exactly +- verify polling and WebSocket state transitions +- verify any new config field is reflected end to end + +## 4. Generate + +## Generated Developer Findings + +### Best mental model + +The repo is easiest to work in when you think in workflows, not just files. The backend slices and frontend pages are different projections of the same product flows. + +### Biggest strengths for day-to-day development + +1. consistent backend slice architecture +2. consistent frontend route and hook layering +3. strong validation gates +4. real examples of nearly every pattern you need +5. explicit local-first runtime story + +### Biggest developer risks + +1. changing shared forecasting assumptions without updating downstream consumers +2. breaking import-order or dependency assumptions in backend slices +3. drifting frontend expectations away from backend contracts +4. under-testing changes that touch AI, ML, or orchestration surfaces + +### Recommended developer heuristics + +1. Stay inside the owner slice until forced out. +2. Treat tests as part of the design. +3. Prefer additive schema changes over broad rewrites. +4. Inspect the workflow end to end before patching. +5. Respect the repo's invariants: time-safety, migrations, strict typing, RFC 7807, approval-gated mutation. + +### Final developer view + +ForecastLabAI is broad but navigable. It rewards disciplined developers who follow established seams and punishes casual cross-cutting edits. The fastest correct path is to read the owner slice, read its tests, and change the smallest coherent unit. diff --git a/docs/arch-dia.md b/docs/arch-dia.md new file mode 100644 index 00000000..e0407f6a --- /dev/null +++ b/docs/arch-dia.md @@ -0,0 +1,346 @@ +# ForecastLabAI Architecture Diagrams + +This document provides diagram-first views of the repository's logic, workflows, stack, APIs, architecture, and reusable patterns. The diagrams are grounded in the inspected code under `app/`, `frontend/src/`, `docker-compose.yml`, and the base docs. + +## 1. System context + +```mermaid +flowchart LR + User[User or Reviewer] + UI[React SPA<br/>Vite + React Query + Router] + API[FastAPI App<br/>app/main.py] + DB[(PostgreSQL 16 + pgvector)] + Artifacts[(Local Artifacts<br/>models backtests registry)] + Providers[OpenAI / Anthropic / Gemini / Ollama] + + User --> UI + UI --> API + API --> DB + API --> Artifacts + API --> Providers +``` + +## 2. Backend slice map + +```mermaid +flowchart TB + Main[app/main.py] + Core[app/core] + Shared[app/shared] + + Main --> Core + Main --> Dimensions + Main --> Analytics + Main --> Ingest + Main --> Featuresets + Main --> Forecasting + Main --> Backtesting + Main --> Registry + Main --> Scenarios + Main --> RAG + Main --> Agents + Main --> Jobs + Main --> Batch + Main --> ModelSelection + Main --> Ops + Main --> Seeder + Main --> Demo + Main --> Config + Main --> Explainability + + Featuresets --> Shared + Forecasting --> Shared + Scenarios --> Shared + Backtesting --> Shared +``` + +## 3. Request handling pattern + +```mermaid +sequenceDiagram + participant Browser + participant Route as FastAPI route + participant Schema as Pydantic schema + participant Service as Slice service + participant DB as AsyncSession + + Browser->>Route: HTTP request + Route->>Schema: Validate request body/query + Route->>Service: Call typed service method + Service->>DB: Read/write data + DB-->>Service: Rows/state + Service-->>Route: Response model + Route-->>Browser: JSON or problem+json +``` + +## 4. Retail data model + +```mermaid +erDiagram + STORE ||--o{ SALES_DAILY : sells + PRODUCT ||--o{ SALES_DAILY : sold_as + CALENDAR ||--o{ SALES_DAILY : dated_by + STORE ||--o{ PRICE_HISTORY : prices + PRODUCT ||--o{ PRICE_HISTORY : price_subject + STORE ||--o{ PROMOTION : promotes + PRODUCT ||--o{ PROMOTION : promotion_subject + STORE ||--o{ INVENTORY_SNAPSHOT_DAILY : stocks + PRODUCT ||--o{ INVENTORY_SNAPSHOT_DAILY : stocked_item + CALENDAR ||--o{ INVENTORY_SNAPSHOT_DAILY : snapshot_date +``` + +## 5. Forecast training flow + +```mermaid +flowchart LR + Sales[Sales and retail history] + Features[Time-safe feature assembly] + Train[ForecastingService.train_model] + Model[Trained model bundle] + Registry[Registry run metadata] + + Sales --> Features + Features --> Train + Train --> Model + Train --> Registry +``` + +## 6. Prediction and planning flow + +```mermaid +flowchart LR + Run[Registered run or artifact] + Predict[Predict endpoint] + Scenario[Scenario simulation] + Planner[Planner UI] + + Run --> Predict + Predict --> Planner + Run --> Scenario + Scenario --> Planner +``` + +## 7. Backtesting and champion selection + +```mermaid +flowchart TD + Availability[Pair availability] + Candidates[Candidate model configs] + Backtests[Backtest each candidate] + Rank[Rank by metrics] + Winner[Winner summary] + Train[Train selected or winner] + Promote[Promote alias] + + Availability --> Candidates + Candidates --> Backtests + Backtests --> Rank + Rank --> Winner + Winner --> Train + Train --> Promote +``` + +## 8. Registry and artifact governance + +```mermaid +flowchart LR + Train[Training workflow] + Artifact[Model artifact on disk] + Run[(model_run)] + Alias[(run_alias)] + Compare[Compare and verify APIs] + + Train --> Artifact + Train --> Run + Run --> Alias + Run --> Compare + Artifact --> Compare +``` + +## 9. RAG indexing workflow + +```mermaid +flowchart LR + Source[Markdown / OpenAPI / docs file] + Hash[Content hash] + Chunk[Chunker] + Embed[Embedding provider] + Store[(rag_source + rag_chunk)] + + Source --> Hash + Hash --> Chunk + Chunk --> Embed + Embed --> Store +``` + +## 10. RAG retrieval workflow + +```mermaid +sequenceDiagram + participant UI as Knowledge page or agent + participant API as /rag/retrieve + participant VDB as pgvector chunks + participant Provider as Embedding provider + + UI->>API: query text + API->>Provider: embed query + Provider-->>API: query vector + API->>VDB: similarity search + VDB-->>API: ranked chunks + API-->>UI: citations and excerpts +``` + +## 11. Agent chat and approval flow + +```mermaid +sequenceDiagram + participant User + participant ChatUI as Chat page + participant WS as /agents/stream + participant Agent as AgentService + participant Tools as Agent tools + participant Approve as /agents/sessions/{id}/approve + + User->>ChatUI: send message + ChatUI->>WS: session_id + message + WS->>Agent: invoke agent + Agent->>Tools: tool call + alt approval required + Agent-->>ChatUI: approval_required + User->>ChatUI: approve or reject + ChatUI->>Approve: decision + Approve-->>Agent: continue or stop + end + Agent-->>ChatUI: text_delta / complete +``` + +## 12. Demo pipeline orchestration + +```mermaid +flowchart LR + Start[Showcase or make demo] + Seed[Seeder] + Features[Featuresets] + Train[Train models] + Backtest[Backtest] + Register[Register winner] + Alias[Create alias] + Knowledge[RAG probe] + Agent[Agent probe] + Finish[Summary] + + Start --> Seed + Seed --> Features + Features --> Train + Train --> Backtest + Backtest --> Register + Register --> Alias + Alias --> Knowledge + Knowledge --> Agent + Agent --> Finish +``` + +## 13. Frontend route topology + +```mermaid +flowchart TD + App[frontend/src/App.tsx] + Dashboard[Dashboard] + Showcase[Showcase] + Ops[Ops] + Explorer[Explorer pages] + Visualize[Visualize pages] + Knowledge[Knowledge] + Chat[Chat] + Guide[Guide] + Admin[Admin] + + App --> Dashboard + App --> Showcase + App --> Ops + App --> Explorer + App --> Visualize + App --> Knowledge + App --> Chat + App --> Guide + App --> Admin +``` + +## 14. Frontend data-flow pattern + +```mermaid +flowchart LR + Page[Page] + Hook[React Query hook] + Api[api helper] + Backend[FastAPI endpoint] + + Page --> Hook + Hook --> Api + Api --> Backend +``` + +## 15. Runtime deployment topology + +```mermaid +flowchart TB + subgraph Compose + Postgres[postgres<br/>pgvector/pg16] + Backend[backend<br/>uvicorn + FastAPI] + Frontend[frontend<br/>Vite] + Ollama[ollama<br/>optional GPU profile] + end + + Frontend --> Backend + Backend --> Postgres + Backend --> Ollama +``` + +## 16. CI/CD flow + +```mermaid +flowchart LR + Dev[Feature branch] + PR[PR to dev] + CI[lint + typecheck + tests + migration check] + MergeDev[Merge to dev] + MainPR[PR dev to main] + ReleasePR[release-please Release PR] + Tag[Tag and release artifacts] + + Dev --> PR + PR --> CI + CI --> MergeDev + MergeDev --> MainPR + MainPR --> CI + CI --> ReleasePR + ReleasePR --> Tag +``` + +## 17. Reusable architectural patterns + +```mermaid +mindmap + root((Reusable patterns)) + Vertical slice + routes + schemas + service + models + tests + Shared backend contracts + settings + db session + problem details + logging + Frontend workflow pattern + page + hook + component + lib helper + AI safety pattern + schema validated tools + approval gate + provider allow-lists + timeouts and caps +``` diff --git a/docs/arch-techstack.md b/docs/arch-techstack.md new file mode 100644 index 00000000..f5c4c81b --- /dev/null +++ b/docs/arch-techstack.md @@ -0,0 +1,247 @@ +# ForecastLabAI Technical Concepts And Tech Stack + +## Overview + +ForecastLabAI is a single-host retail demand forecasting platform implemented as a modular monolith with a React SPA frontend. It combines data-platform, ML, RAG, and agentic capabilities in one repository and one local runtime topology. + +## Layered technical model + +| Layer | Main technology | Responsibility | +|---|---|---| +| UI | React 19, TypeScript 5.9, Vite 7, Tailwind 4, shadcn/ui | Workflow surfaces, charts, controls, streaming UX | +| API | FastAPI, Pydantic v2 | Typed HTTP and WebSocket contracts | +| Services | Python service modules per slice | Business logic and orchestration | +| Persistence | SQLAlchemy 2.0 async, PostgreSQL 16, pgvector | Relational data, JSONB state, vector retrieval | +| ML | pandas, numpy, scikit-learn, joblib, optional LightGBM/XGBoost | Forecast training, prediction, evaluation | +| AI | PydanticAI, OpenAI, Anthropic, optional Gemini and Ollama | RAG embeddings, agent reasoning, tool use | +| Tooling | uv, pnpm, Alembic, Ruff, mypy, pyright, pytest | Development, quality, migration, release | + +## Backend stack + +### Runtime and framework + +- Python 3.12 +- FastAPI +- Uvicorn +- Pydantic v2 +- Pydantic Settings v2 + +Key concepts: + +- async request handling +- typed request and response models +- RFC 7807 error contracts +- startup lifecycle hooks +- WebSocket streaming for agents and demo pipeline + +### Data access + +- SQLAlchemy 2.0 async ORM +- `asyncpg` driver +- session creation via `app/core/database.py` +- migration management via Alembic + +Key concepts: + +- `Mapped[]` ORM typing +- `mapped_column()` +- async session dependency injection +- commit/rollback at request scope + +### Persistence patterns + +ForecastLabAI uses three persistence patterns: + +1. relational dimensions and facts + - `store`, `product`, `calendar`, `sales_daily`, `price_history`, `promotion`, `inventory_snapshot_daily` +2. operational JSONB-rich entities + - runs, jobs, sessions, scenarios, config +3. vector-backed retrieval entities + - RAG sources and chunks with embeddings + +### Cross-cutting backend concepts + +- centralized settings in `app/core/config.py` +- structured logging via `structlog` +- request correlation via middleware +- problem-details serialization for failures +- strict type-checking as a design constraint, not just a lint step + +## Frontend stack + +### Core libraries + +- React 19 +- TypeScript 5.9 +- Vite 7 +- React Router +- TanStack Query +- TanStack Table +- Recharts +- Tailwind CSS 4 +- shadcn/ui and Radix primitives +- Lucide icons + +### Frontend concepts + +- route-oriented application shell +- lazy-loaded page modules +- React Query hooks for API state +- reusable domain components +- helper libraries for formatting and transform logic +- dedicated WebSocket hook for streaming flows + +### Major frontend domains + +- dashboard and KPI summaries +- explorer pages for stores, products, jobs, runs, and sales +- visualize pages for forecasting, backtesting, batch, champion selection, demand, and planning +- showcase page for the full demo pipeline +- knowledge page for RAG state and retrieval +- chat page for agent interaction +- admin page for AI provider and model settings + +## ML and forecasting stack + +### Core packages + +- pandas +- numpy +- scikit-learn +- joblib +- optional LightGBM +- optional XGBoost + +### Core concepts + +- time-safe feature engineering +- train/predict split by service boundary +- backtesting with time-series folds +- model-family and feature metadata +- persisted model bundles +- registry-backed governance and aliases + +### ML design choices + +- baselines remain first-class +- advanced models are optional extras +- artifact persistence is local filesystem based +- scenario simulation differentiates heuristic and model-driven methods +- model selection is a distinct workflow, not a side effect of training + +## RAG and agent stack + +### RAG + +- pgvector +- OpenAI embeddings or Ollama embeddings +- chunkers by source type +- similarity retrieval with thresholding +- idempotent indexing using content hashes + +### Agents + +- PydanticAI +- Anthropic and OpenAI as main hosted providers +- optional Gemini identifiers supported in config +- tool-calling with schema validation +- approval gate for mutating actions +- session persistence in Postgres +- streaming token/tool events over WebSocket + +### AI control-plane concepts + +- live provider switching through config APIs +- fallback model support +- session TTL and tool-call caps +- timeout and retry controls +- explicit allow-lists for model identifier providers + +## Database and schema stack + +### Database + +- PostgreSQL 16 +- pgvector extension +- local port `5433` to container `5432` + +### Migration management + +- Alembic +- forward-only migration policy after merge +- 18 migrations observed in the repo at inspection time + +### Data model concepts + +- star-schema-like retail data platform +- JSONB for flexible operational entities +- vector embeddings inside Postgres instead of a separate vector store + +## Development and quality stack + +### Backend package and environment tooling + +- `uv` +- `.env` + Pydantic settings + +### Frontend package tooling + +- `pnpm` +- corepack-enabled workflow + +### Quality gates + +- Ruff +- mypy `--strict` +- pyright `--strict` +- pytest + +### CI/CD + +- GitHub Actions +- release-please + +Key pipeline concepts: + +- blocking lint, typecheck, test, and migration jobs +- Release PR flow from `dev` to `main` +- wheel and sdist build on release creation + +## Runtime topology + +### Core local services + +1. Postgres +2. backend API +3. frontend dev server +4. optional Ollama + +### Container strategy + +- Docker Compose for local orchestration +- bind mounts for hot reload +- shared named volume for artifacts +- health checks for all main services + +## Architectural conventions enforced by the stack + +1. Vertical slices own their business logic. +2. `core` and `shared` are the sanctioned cross-cutting surfaces. +3. Schema changes require migrations. +4. API boundaries require Pydantic validation. +5. Time-safe feature engineering is mandatory. +6. AI mutation tools require approval. +7. The product must remain single-host runnable. + +## Why this stack fits the repo + +The stack fits because the product needs: + +- a fast local development loop +- typed API and schema boundaries +- strong data tooling for forecasting +- one database that can handle relational and vector workloads +- a modern dashboard frontend +- enough AI flexibility to compare hosted and local providers + +The stack would be a poor fit for a high-scale multi-tenant SaaS, but that is not the repository's goal. diff --git a/docs/flow-pack/commands/flow-prime.md b/docs/flow-pack/commands/flow-prime.md index 579aff41..c46d4432 100644 --- a/docs/flow-pack/commands/flow-prime.md +++ b/docs/flow-pack/commands/flow-prime.md @@ -46,7 +46,7 @@ Gather the five GitHub categories: !`gh issue list --state open --limit 20 --json number,title,labels --jq '.[] | "#\(.number): \(.title) [\(.labels | map(.name) | join(","))]"'` -!`gh milestone list --json number,title,state --jq '.[] | "#\(.number) \(.title) (\(.state))"'` +!`gh api repos/{owner}/{repo}/milestones --jq '.[] | "#\(.number) \(.title) (\(.state))"'` !`gh label list --json name --jq '[.[].name] | sort | join(", ")'`