Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 7 additions & 5 deletions .agent-plan.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,11 +42,13 @@ early against the known-good lead-scoring path + physical reorg into
Status: `LTV-M0` landed (#102, #103, #106). `LTV-M1`: `LTV-Pb` merged (#104);
`LTV-Pc` (pLTV feature/task specs) still outstanding. `LTV-M2`: `LTV-Pd` (#107)
and `LTV-Pe` (#108) merged (scheme protocol + render seam). `LTV-Pf` (physical
move, **hard break / no shims** per D12) split into Pf.1 (compute core —
simulation/mechanisms/structure moved) opened as **#109**, and Pf.2 (render
move, pending). Verified byte-identical. Sibling `leadforge-datasets-private`
build scripts must update to the new import paths (breakage issue filed). Next:
`LTV-Pf.2` (render), then `LTV-Pg` (scaffold `schemes/lifecycle/`).
move, **hard break / no shims** per D12): Pf.1 (compute core —
simulation/mechanisms/structure) merged (#109); Pf.2 (lead-scoring render —
snapshots/relational_snapshot_safe/tasks moved + relational.py split so the
shared write_relational_tables stays in the envelope) opened as **#110**. Both
byte-identical. Sibling `leadforge-datasets-private` consumes bundle files, not
internals — no lockstep update needed (heads-up issue #8 filed). Next:
`LTV-Pg` (scaffold `schemes/lifecycle/` + relocate the lead-scoring schema specs).

---

Expand Down
18 changes: 14 additions & 4 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,20 @@ back-compat shims, by design):
| `leadforge.simulation.*` | `leadforge.schemes.lead_scoring.simulation.*` |
| `leadforge.mechanisms.*` | `leadforge.schemes.lead_scoring.mechanisms.*` |
| `leadforge.structure.*` | `leadforge.schemes.lead_scoring.structure.*` |

`render/{snapshots,relational,tasks}` and the lead-scoring `schema` specs
relocate in follow-up PRs. Consumers importing internals (e.g. the
`leadforge-datasets-private` build scripts) must update to the new paths;
| `leadforge.render.snapshots` | `leadforge.schemes.lead_scoring.render.snapshots` |
| `leadforge.render.relational_snapshot_safe` | `leadforge.schemes.lead_scoring.render.relational_snapshot_safe` |
| `leadforge.render.tasks` | `leadforge.schemes.lead_scoring.render.tasks` |
| `leadforge.render.relational:to_dataframes` | `leadforge.schemes.lead_scoring.render.relational:to_dataframes` |
| `leadforge.render.relational:write_relational_tables` | `leadforge.render.relational_io:write_relational_tables` |

The flat `leadforge.render.relational` module is **removed**: its 9-table
assembler (`to_dataframes`) moved to the scheme, and the scheme-agnostic writer
(`write_relational_tables`) moved to the new `leadforge.render.relational_io`
(renamed to avoid a basename clash with the scheme's `relational.py`).
`leadforge.render` remains the shared bundle-output envelope
(`relational_io` + `manifests`). The lead-scoring `schema`
specs relocate in a follow-up PR (LTV-Pg). Consumers importing internals (e.g.
the `leadforge-datasets-private` build scripts) must update to the new paths;
the package stays on the `1.x` line (the public contract did not change).

### CLI surfaces v4 fields
Expand Down
25 changes: 12 additions & 13 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,11 +159,11 @@ leadforge/
schema/ entities.py, relationships.py, events.py, features.py, tasks.py, dictionaries.py
schemes/ base.py (GenerationScheme protocol + SCHEME_REGISTRY);
lead_scoring/ — the lead-scoring scheme: __init__.py (build_world/
write_bundle) + simulation/, mechanisms/, structure/ (moved in
LTV-Pf.1). render/ + lead-scoring schema specs migrate here in
LTV-Pf.2 / LTV-Pg. See docs/ltv/design.md §2.5.
render/ relational.py (+ write_relational_tables), snapshots.py, manifests.py, tasks.py
# lead-scoring render still here pending LTV-Pf.2
write_bundle) + simulation/, mechanisms/, structure/, render/
(moved in LTV-Pf.1/Pf.2). Lead-scoring schema specs migrate
here in LTV-Pg. See docs/ltv/design.md §2.5.
render/ relational_io.py (write_relational_tables — shared writer), manifests.py
# shared bundle-output envelope
exposure/ modes.py, filters.py, redaction.py
validation/ invariants.py, artifact_checks.py, realism.py, difficulty.py, drift.py
recipes/ registry.py, b2b_saas_procurement_v1/{recipe,narrative,schema,motifs,difficulty_profiles}.yaml
Expand Down Expand Up @@ -248,14 +248,13 @@ leadforge/ # Python package root
│ ├── __init__.py # build_world() + write_bundle()
│ ├── structure/ # Hidden world graph (WorldGraph, motifs, sampler)
│ ├── mechanisms/ # Node/edge behavior (policies, hazards, scores, …)
│ └── simulation/ # World evolution (engine, population, state)
│ # NOTE (LTV-M2 reorg in progress): render/{snapshots,relational,tasks}
│ # relocate under schemes/lead_scoring/ in a follow-up; schema specs split
│ # in LTV-Pg. See docs/ltv/design.md §2.5 for the target layout.
├── render/ # Bundle output (envelope + not-yet-moved lead-scoring render)
│ ├── snapshots.py # build_snapshot() — ML-ready lead table
│ ├── relational.py # to_dataframes() — 9-table dict
│ ├── tasks.py # write_task_splits() — train/valid/test Parquet
│ ├── simulation/ # World evolution (engine, population, state)
│ └── render/ # Lead-scoring render: snapshots, relational
│ # (to_dataframes), relational_snapshot_safe, tasks
│ # NOTE (LTV-M2 reorg in progress): lead-scoring schema specs split in LTV-Pg.
│ # See docs/ltv/design.md §2.5 for the target layout.
├── render/ # Shared bundle-output envelope
│ ├── relational_io.py # write_relational_tables() — shared table writer
│ └── manifests.py # build_manifest(), write_manifest()
├── exposure/ # Truth filtering
│ ├── modes.py # apply_exposure() dispatch
Expand Down
12 changes: 6 additions & 6 deletions docs/ltv/roadmap.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ protocol + registry, with the package physically reorganized into
|-----------|------------|-----|------------|
| `LTV-M0` | Planning + design lock | `LTV-Pa` | #102, #103 (+ scheme reframe) |
| `LTV-M1` | Lifecycle schema foundation | `LTV-Pb`, `LTV-Pc` | #104 (Pb) |
| `LTV-M2` | Generation-scheme architecture + physical reorg | `LTV-Pd`, `LTV-Pe`, `LTV-Pf`, `LTV-Pg` | #107 (Pd), #108 (Pe), #109 (Pf.1) |
| `LTV-M2` | Generation-scheme architecture + physical reorg | `LTV-Pd`, `LTV-Pe`, `LTV-Pf`, `LTV-Pg` | #107 (Pd), #108 (Pe), #109 (Pf.1), #110 (Pf.2) |
| `LTV-M3` | Customer population + lifecycle world | `LTV-Ph`, `LTV-Pi` | |
| `LTV-M4` | Lifecycle simulation engine | `LTV-Pj`, `LTV-Pk` | |
| `LTV-M5` | Customer snapshots + pLTV targets (both regimes) | `LTV-Pl`, `LTV-Pm` | |
Expand Down Expand Up @@ -125,11 +125,11 @@ Total: ~19 PRs across 9 milestones.
- [x] **`LTV-Pf.1`** — compute core: `simulation/` + `mechanisms/` +
`structure/` moved as whole directories (21 file renames, all callers
rewritten). Verified byte-identical; full suite green. (**PR #109**)
- [ ] **`LTV-Pf.2`** — render: relocate `render/{snapshots,relational,tasks}`
under the scheme, splitting `render/relational.py` so the shared
`write_relational_tables` stays in the envelope while the 9-table
`to_dataframes` moves. (The lead-scoring `schema` specs split lands with
`LTV-Pg`.)
- [x] **`LTV-Pf.2`** — render: relocated `render/{snapshots,relational_snapshot_safe,tasks}`
under `schemes/lead_scoring/render/`, and split `render/relational.py` so the
shared `write_relational_tables` stays in the envelope while the 9-table
`to_dataframes` moved. Verified byte-identical; full suite green. (**PR #110**)
(The lead-scoring `schema` specs split lands with `LTV-Pg`.)
- Tests: full suite + hash-determinism green; public API imports unchanged.
- Labels: `type: refactor`, `layer: schema`, `layer: simulation`, `layer: render`
- [ ] **`LTV-Pg`** — `refactor: scaffold schemes/lifecycle/ + relocate LTV-Pb/Pc specs`.
Expand Down
2 changes: 1 addition & 1 deletion leadforge/exposure/filters.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ class BundleFilter:
relational_snapshot_safe: Whether the relational ``tables/`` dict
must be projected onto the snapshot-safe shape before being
written. When ``True``, the bundle writer routes through
:func:`leadforge.render.relational_snapshot_safe.to_dataframes_snapshot_safe`,
:func:`leadforge.schemes.lead_scoring.render.relational_snapshot_safe.to_dataframes_snapshot_safe`,
which strips :data:`leadforge.validation.leakage_probes.BANNED_LEAD_COLUMNS`
from ``leads``, :data:`~leadforge.validation.leakage_probes.BANNED_OPP_COLUMNS`
from ``opportunities``, filters event tables per-lead by
Expand Down
4 changes: 2 additions & 2 deletions leadforge/render/manifests.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@
# self-describing (``null`` means full-horizon, legacy behaviour).
# "5" — PR 2.2: ``student_public`` bundles route through the
# snapshot-safe relational export (
# :mod:`leadforge.render.relational_snapshot_safe`). Public
# :mod:`leadforge.schemes.lead_scoring.render.relational_snapshot_safe`). Public
# ``leads`` drops ``converted_within_90_days`` /
# ``conversion_timestamp``; public ``opportunities`` drops
# ``close_outcome`` / ``closed_at``; public bundles omit
Expand Down Expand Up @@ -91,7 +91,7 @@ def build_manifest(
package internals. Defaults to ``[]`` (nothing redacted).
relational_snapshot_safe: ``True`` if the relational ``tables/``
were projected through
:func:`leadforge.render.relational_snapshot_safe.to_dataframes_snapshot_safe`
:func:`leadforge.schemes.lead_scoring.render.relational_snapshot_safe.to_dataframes_snapshot_safe`
before being written. Recorded in the manifest so a tool
reading a v5+ bundle can tell from the manifest alone whether
``tables/`` is the snapshot-safe (public) shape or the
Expand Down
56 changes: 56 additions & 0 deletions leadforge/render/relational_io.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
"""Shared relational-table writer (bundle-output envelope).

:func:`write_relational_tables` is the scheme-agnostic step that serialises a
``{table_name: DataFrame}`` dict to a bundle's ``tables/`` directory. Each
generation scheme decides the relational *shape* (which tables, any
snapshot-safe projection) and then calls this to write them. The lead-scoring
table *assembler* (``to_dataframes``) lives with its scheme in
:mod:`leadforge.schemes.lead_scoring.render.relational`.
"""

from __future__ import annotations

from typing import TYPE_CHECKING

import pandas as pd

if TYPE_CHECKING:
from collections.abc import Collection
from pathlib import Path


def write_relational_tables(
dfs: dict[str, pd.DataFrame],
tables_dir: Path,
*,
redacted: Collection[str] = frozenset(),
) -> dict[str, int]:
"""Write a ``{table_name: DataFrame}`` dict to *tables_dir* as Parquet.

A shared, scheme-agnostic envelope step used by each scheme's
``write_bundle``: it drops any *redacted* columns present in a table,
writes one ``<name>.parquet`` per entry, and returns ``{table_name:
row_count}``. The relational *shape* (which tables, snapshot-safe
projection) is the scheme's concern and is decided before calling this.

Args:
dfs: Mapping of table name → DataFrame, already projected to the
published shape (e.g. snapshot-safe for ``student_public``).
tables_dir: Destination directory (created if absent).
redacted: Column names to strip from any table that contains them.

Returns:
Row count per written table, in *dfs* iteration order.
"""
from leadforge.schema.tables import write_parquet

tables_dir.mkdir(parents=True, exist_ok=True)
row_counts: dict[str, int] = {}
for table_name, df in dfs.items():
if redacted:
cols_to_drop = [c for c in redacted if c in df.columns]
if cols_to_drop:
df = df.drop(columns=cols_to_drop)
write_parquet(df, tables_dir / f"{table_name}.parquet")
row_counts[table_name] = len(df)
return row_counts
11 changes: 7 additions & 4 deletions leadforge/schemes/lead_scoring/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -162,13 +162,16 @@ def write_bundle(
from leadforge.exposure.modes import apply_exposure
from leadforge.narrative.dataset_card import render_dataset_card
from leadforge.render.manifests import build_manifest, write_manifest
from leadforge.render.relational import to_dataframes, write_relational_tables
from leadforge.render.relational_snapshot_safe import to_dataframes_snapshot_safe
from leadforge.render.snapshots import build_snapshot
from leadforge.render.tasks import write_task_splits
from leadforge.render.relational_io import write_relational_tables
from leadforge.schema.dictionaries import write_feature_dictionary
from leadforge.schema.features import LEAD_SNAPSHOT_FEATURES, redacted_columns_for
from leadforge.schema.tasks import task_manifest_for_config
from leadforge.schemes.lead_scoring.render.relational import to_dataframes
from leadforge.schemes.lead_scoring.render.relational_snapshot_safe import (
to_dataframes_snapshot_safe,
)
from leadforge.schemes.lead_scoring.render.snapshots import build_snapshot
from leadforge.schemes.lead_scoring.render.tasks import write_task_splits

if (
bundle.simulation_result is None
Expand Down
Empty file.
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,6 @@
)

if TYPE_CHECKING:
from collections.abc import Collection
from pathlib import Path

from leadforge.schemes.lead_scoring.simulation.engine import SimulationResult
from leadforge.schemes.lead_scoring.simulation.population import PopulationResult

Expand Down Expand Up @@ -85,40 +82,3 @@ def to_dataframes(
df = src.cls.empty_dataframe()
dfs[table_name] = df
return dfs


def write_relational_tables(
dfs: dict[str, pd.DataFrame],
tables_dir: Path,
*,
redacted: Collection[str] = frozenset(),
) -> dict[str, int]:
"""Write a ``{table_name: DataFrame}`` dict to *tables_dir* as Parquet.

A shared, scheme-agnostic envelope step used by each scheme's
``write_bundle``: it drops any *redacted* columns present in a table,
writes one ``<name>.parquet`` per entry, and returns ``{table_name:
row_count}``. The relational *shape* (which tables, snapshot-safe
projection) is the scheme's concern and is decided before calling this.

Args:
dfs: Mapping of table name → DataFrame, already projected to the
published shape (e.g. snapshot-safe for ``student_public``).
tables_dir: Destination directory (created if absent).
redacted: Column names to strip from any table that contains them.

Returns:
Row count per written table, in *dfs* iteration order.
"""
from leadforge.schema.tables import write_parquet

tables_dir.mkdir(parents=True, exist_ok=True)
row_counts: dict[str, int] = {}
for table_name, df in dfs.items():
if redacted:
cols_to_drop = [c for c in redacted if c in df.columns]
if cols_to_drop:
df = df.drop(columns=cols_to_drop)
write_parquet(df, tables_dir / f"{table_name}.parquet")
row_counts[table_name] = len(df)
return row_counts
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ def write_task_splits(

Args:
snapshot: Lead snapshot DataFrame from
:func:`~leadforge.render.snapshots.build_snapshot`.
:func:`~leadforge.schemes.lead_scoring.render.snapshots.build_snapshot`.
out_dir: Parent directory for task outputs (typically
``bundle_root / "tasks"``).
seed: Seed used for deterministic row shuffle.
Expand Down
2 changes: 1 addition & 1 deletion leadforge/validation/leakage_probes.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@

# ---------------------------------------------------------------------------
# Snapshot-safe contract — single source of truth for "what is leakage".
# ``leadforge.render.relational_snapshot_safe`` (writer) and
# ``leadforge.schemes.lead_scoring.render.relational_snapshot_safe`` (writer) and
# ``leadforge.render.manifests`` (manifest's structural_redactions) import
# from here so the writer and the validator share one definition.
# ---------------------------------------------------------------------------
Expand Down
2 changes: 1 addition & 1 deletion scripts/build_midproject_lead_scoring.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
softcap_expected_acv,
subsample,
)
from leadforge.render.snapshots import build_snapshot
from leadforge.schemes.lead_scoring.render.snapshots import build_snapshot

# ---------------------------------------------------------------------------
# Orchestration
Expand Down
2 changes: 1 addition & 1 deletion scripts/build_v4_snapshot.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
import pandas as pd

from leadforge.api.generator import Generator
from leadforge.render.snapshots import build_snapshot
from leadforge.schemes.lead_scoring.render.snapshots import build_snapshot

# ---------------------------------------------------------------------------
# Constants
Expand Down
2 changes: 1 addition & 1 deletion scripts/build_v5_snapshot.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
rename_and_select,
subsample,
)
from leadforge.render.snapshots import build_snapshot
from leadforge.schemes.lead_scoring.render.snapshots import build_snapshot

# ---------------------------------------------------------------------------
# Orchestration (stays in script — depends on Generator)
Expand Down
2 changes: 1 addition & 1 deletion scripts/build_v6_snapshot.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@
softcap_expected_acv,
subsample,
)
from leadforge.render.snapshots import build_snapshot
from leadforge.schemes.lead_scoring.render.snapshots import build_snapshot

# ---------------------------------------------------------------------------
# Orchestration
Expand Down
2 changes: 1 addition & 1 deletion scripts/build_v7_snapshot.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@
softcap_expected_acv,
subsample,
)
from leadforge.render.snapshots import build_snapshot
from leadforge.schemes.lead_scoring.render.snapshots import build_snapshot

# ---------------------------------------------------------------------------
# Orchestration
Expand Down
2 changes: 1 addition & 1 deletion scripts/spike_category_signal.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@

from leadforge.api.generator import Generator
from leadforge.core.rng import RNGRoot
from leadforge.render.snapshots import build_snapshot
from leadforge.schemes.lead_scoring.render.snapshots import build_snapshot
from leadforge.schemes.lead_scoring.simulation.engine import simulate_world
from leadforge.schemes.lead_scoring.simulation.population import PopulationResult, build_population
from leadforge.schemes.lead_scoring.structure.sampler import sample_hidden_graph
Expand Down
2 changes: 1 addition & 1 deletion tests/integration/test_snapshot_safe_bundle.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

Covers the contract turned on in PR 2.2: ``student_public`` bundles
route ``tables/`` through
:func:`leadforge.render.relational_snapshot_safe.to_dataframes_snapshot_safe`
:func:`leadforge.schemes.lead_scoring.render.relational_snapshot_safe.to_dataframes_snapshot_safe`
(the structural fix against the alpha-bundle reconstruction paths
A-E), ``research_instructor`` bundles keep the full-horizon export,
and the manifest is self-describing via ``relational_snapshot_safe``,
Expand Down
2 changes: 1 addition & 1 deletion tests/render/test_relational_snapshot_safe.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
import pandas as pd
import pytest

from leadforge.render.relational_snapshot_safe import (
from leadforge.schemes.lead_scoring.render.relational_snapshot_safe import (
BANNED_LEAD_COLUMNS,
BANNED_OPP_COLUMNS,
BANNED_TABLES,
Expand Down
Loading
Loading