feat(render,validation): snapshot-safe relational export + leakage validator (PR 2.1) by shaypal5 · Pull Request #63 · leadforge-dev/leadforge

shaypal5 · 2026-05-05T17:39:26Z

Summary

PR 2.1 of the v1 dataset release sequence (roadmap §"Phase 2 — Snapshot-safe relational export"). Module-level only — adds the structural fix that the Phase 1 audit (PR #62) verified empirically against, plus the matching validator. PR 2.2 will wire these through the bundle writer and bump BUNDLE_SCHEMA_VERSION 4 → 5.

What's in

leadforge/render/relational_snapshot_safe.py — to_dataframes_snapshot_safe(dfs, *, snapshot_day) projects the full-horizon dict onto the public-bundle shape: drops BANNED_LEAD_COLUMNS from leads, BANNED_OPP_COLUMNS from opportunities, filters event tables (touches/sessions/sales_activities/opportunities) per-lead by lead_created_at + snapshot_day, omits customers/subscriptions, passes accounts/contacts unchanged. Pure function, no I/O. leadforge/render/relational.py is untouched — the full-horizon export still backs research_instructor.
leadforge/validation/relational_leakage.py — five probes (probe_banned_columns, probe_banned_tables, probe_deterministic_reconstruction, probe_snapshot_window, probe_bonus_model_auc) feeding LeakageFinding/LeakageReport/RelationalLeakageError. Two orchestrators: run_all_probes (file-based, what PR 2.2 will call from bundle_checks.validate_bundle) and run_all_probes_on_dataframes (in-memory, for unit tests). Bonus-model band defaults to 0.65 with a # TODO(PR 3.3) for per-tier calibration. Constants imported from the render module so writer + validator share a single source of truth.
scripts/probe_relational_leakage.py — deterministic_relational_reconstruction lifted into the package; the script re-exports it from leadforge.validation.relational_leakage. The function is now defined once and shared by the alpha-bundle audit (PR 1.1) and the validator (this PR). tests/scripts/test_probe_relational_leakage.py is unchanged and still passes.
Tests — tests/render/test_relational_snapshot_safe.py (14 property tests) + tests/validation/test_relational_leakage.py (23 probe tests covering clean + tampered synthetic bundles, file-based orchestrator, ergonomics).

What's NOT in (deferred to PR 2.2)

Wiring through leadforge/exposure/filters.py and leadforge/api/bundle.py.
BUNDLE_SCHEMA_VERSION 4 → 5 and the relational_snapshot_safe manifest field.
Plumbing the validator into leadforge/validation/bundle_checks.py.
Regenerating the alpha bundles under release/.

Acceptance (from the handoff)

pytest tests/render/test_relational_snapshot_safe.py tests/validation/test_relational_leakage.py passes (14 + 23 = 37 tests).
The validator, run against a synthetic public bundle that does contain the four leakage channels, fails with a structured finding per channel.
The validator, run against a public bundle produced by to_dataframes_snapshot_safe, passes every probe.
scripts/probe_relational_leakage.py re-uses the lifted function rather than duplicating logic — the script and the package are one implementation. (Per the handoff sign-off, this PR also takes PR 3.1's lift-into-package work, since deferring it would have meant re-implementation here.)

Verification

pytest: 1001 passed (37 new + 964 existing).
ruff check + ruff format --check: clean.
mypy leadforge/: no issues across 80 source files.

References

docs/release/v1_release_roadmap.md §"Phase 2 — Snapshot-safe relational export"
docs/release/v1_release_design.md §"Snapshot-safe relational export"
docs/release/v1_current_state_audit.md §"Pointer to the structural fix — PR 2.1"
Phase 1 audit: docs: PR 1.1 — relational leakage audit on alpha bundles #62

Test plan

Targeted: pytest tests/render/test_relational_snapshot_safe.py tests/validation/test_relational_leakage.py tests/scripts/test_probe_relational_leakage.py.
Full suite: pytest.
Lint: ruff check . && ruff format --check ..
Types: mypy leadforge/.
PR 2.2 will additionally regenerate the alpha bundles and confirm that re-running the now-wired validator on release/{intro,intermediate,advanced} produces an empty LeakageReport.

🤖 Generated with Claude Code

…lidator PR 2.1 of the v1 dataset release sequence. Module-level only — adds the structural fix audited against in PR 1.1, plus the matching validator that asserts the fix is in place on any bundle claiming to be student_public. PR 2.2 will wire these through bundle.py / exposure/filters.py and bump BUNDLE_SCHEMA_VERSION 4 → 5. leadforge/render/relational_snapshot_safe.py to_dataframes_snapshot_safe(dfs, *, snapshot_day) projects the full-horizon dict from leadforge.render.relational.to_dataframes onto the public-bundle shape: - leads: drops BANNED_LEAD_COLUMNS (converted_within_90_days, conversion_timestamp). - opportunities: drops BANNED_OPP_COLUMNS (close_outcome, closed_at) and filters per-lead by created_at <= lead_created_at + snapshot_day. - touches/sessions/sales_activities: filtered per-lead on each timestamp column to the same boundary (defence-in-depth — alpha bundles already pass G4.4 because the 90d horizon bounds events). - customers/subscriptions: omitted entirely (BANNED_TABLES). - accounts/contacts: pass-through. Pure function, no I/O, deterministic. leadforge/render/relational.py is unchanged (full-horizon export still backs research_instructor). leadforge/validation/relational_leakage.py Five probes feeding LeakageFinding / LeakageReport: - probe_banned_columns — Path A column scan. - probe_banned_tables — customers/subscriptions absent. - probe_deterministic_reconstruction — paths B/C/D zero-hit on the join graph. - probe_snapshot_window — event timestamps within lead_created_at + snapshot_day. - probe_bonus_model_auc — 5-fold CV LR + HistGBM on n_opps / max_acv / mean_acv; default ceiling 0.65 with a TODO(PR 3.3) for per-tier calibration. Two orchestrators (file-based and in-memory) plus RelationalLeakageError carrying the LeakageReport. Constants imported from the render module so writer + validator share one source of truth for "snapshot-safe". scripts/probe_relational_leakage.py deterministic_relational_reconstruction lifted into the package (where PR 3.1 was already slated to land it); the script now re-exports it from leadforge.validation.relational_leakage. No behavioural change — existing tests/scripts/test_probe_relational_leakage.py still passes against `probe_module.deterministic_relational_reconstruction`. Tests tests/render/test_relational_snapshot_safe.py — 14 property tests: banned cols absent; BANNED_TABLES absent from output dict; per-lead snapshot-window invariant holds; accounts/contacts pass-through; idempotent on already-safe input; deterministic across two calls; no input mutation; canonical output table order; missing-optional tolerance; snapshot_day=0 edge; negative-day raises; missing leads raises; missing anchor cols raises. tests/validation/test_relational_leakage.py — 23 probe tests over a 50-lead synthetic bundle (clean + tampered): every probe silent on the clean bundle (orchestrator + individual); each leakage channel re-introduced one at a time fires the matching probe with the expected detail; deterministic probe does not flag Path A (covered by banned_columns); bonus-model fires when customers re-introduced; bonus-model skips cleanly when label unavailable; orchestrator aggregates findings across channels; raise_if_failing carries the report; file-based run_all_probes reads/exits cleanly on missing tables/leads; package and script exports of deterministic_relational_reconstruction agree. All 1001 existing tests still pass; ruff + mypy clean. Refs: docs/release/v1_release_roadmap.md §"Phase 2"; docs/release/v1_release_design.md §"Snapshot-safe relational export"; docs/release/v1_current_state_audit.md §"Pointer to the structural fix" Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Self-review followup to PR 2.1. Both ``_build_anchor`` (in ``relational_snapshot_safe``) and ``probe_snapshot_window`` (in ``relational_leakage``) build a per-lead anchor by selecting ``leads[['lead_id','lead_created_at']]`` and merging it back onto each event table. If ``leads.lead_id`` were not unique, the left-merge would broadcast and silently inflate event-table row counts (export) or produce a mask whose length doesn't match the event frame (probe). ``deterministic_relational_reconstruction`` already asserts the same invariant; align both new modules with it so a duplicate-key bug surfaces with a clear ValueError instead of a quiet wrong answer. Adds two regression tests (one per module) that pass duplicated leads and assert the matching ValueError. Tests now total 1003 passing; ruff + mypy still clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Pull request overview

This PR introduces the module-level building blocks for Phase 2 “snapshot-safe relational export”: a pure transformation that projects full-horizon relational tables into the student_public shape (removing known reconstruction channels), plus a matching leakage validator with multiple probes and orchestrators. It also consolidates the deterministic reconstruction logic into a single canonical implementation and adds focused test coverage around both the exporter and validator.

Changes:

Added to_dataframes_snapshot_safe(..., snapshot_day=...) to enforce the public relational table contract (drop banned columns/tables and filter event tables to the snapshot window).
Added leadforge.validation.relational_leakage with a structured findings/report model, five probes, and both in-memory and file-based orchestrators.
Updated the existing leakage probe script to re-export the now-canonical deterministic reconstruction function; added comprehensive tests for exporter and validator behavior.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`leadforge/render/relational_snapshot_safe.py`	Implements snapshot-safe projection (banned columns/tables + per-lead snapshot window filtering).
`leadforge/validation/relational_leakage.py`	Adds leakage probes, report types, orchestrators, and canonical deterministic reconstruction logic.
`scripts/probe_relational_leakage.py`	Re-exports deterministic reconstruction from the package to avoid duplicated logic.
`tests/render/test_relational_snapshot_safe.py`	Property tests for column/table dropping, snapshot filtering, idempotence, and determinism.
`tests/validation/test_relational_leakage.py`	Probe tests for clean vs tampered bundles plus orchestrator/report ergonomics.
`.agent-plan.md`	Updates roadmap checklist to mark Phase 2 module work complete in PR 2.1.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Self-review-driven cleanup of the structural/leakage modules from 9ebf571 / 2922eb8. Behaviour-preserving for the structural probes; the bonus-model probe goes from default-on to opt-in, which is the substantive correctness change. P0 — substantive fixes * Bonus-model probe is now opt-in. Orchestrators (``run_all_probes`` / ``run_all_probes_on_dataframes``) gain ``bonus_model_max_auc: float | None = None`` and skip the probe unless that threshold is set. The probe function itself now requires ``max_auc`` as a keyword (no ``DEFAULT_MAX_BONUS_AUC``). Rationale: PR 3.3 owns the calibrated per-tier band; PR 2.1 must not default-on an uncalibrated 0.65 ceiling that would fail every alpha bundle when PR 2.2 wires it into validate_bundle. * ``_resolve_label`` rejects misaligned labels with a clear ``ValueError`` instead of letting them silently no-op via the binary-cardinality skip — a leakage validator that hides bugs by design defeats its own purpose. * ``RelationalLeakageError`` now inherits from ``LeadforgeError`` instead of ``AssertionError`` (which gets stripped under ``python -O`` and is the wrong base class for a domain error). P1 — architecture / naming * Contract constants (``BANNED_LEAD_COLUMNS`` / ``BANNED_OPP_COLUMNS`` / ``BANNED_TABLES`` / ``SNAPSHOT_FILTERED_TABLES``) move from the render module to the validator. The validator owns the definition of "leakage"; the writer enforces it and imports the contract. This flips one render→validation import direction we had backwards. * ``EVENT_TABLES`` → ``SNAPSHOT_FILTERED_TABLES``. The original name implied "all event tables", but ``opportunities`` is an entity table that we per-lead-filter by ``created_at``; the new name says what the constant is for. * ``CHANNEL_DETERMINISTIC_PATH`` → ``CHANNEL_JOIN_RECONSTRUCTION``. Path A is a deterministic reconstruction too — it is just a column read. The probe deliberately scopes itself to join-graph paths (B/C/D) and delegates Path A to ``probe_banned_columns``; the channel constant now matches. Tests * Two new tests cover the new behaviours: - ``test_bonus_model_probe_rejects_misaligned_label`` — the ``_resolve_label`` guard fires loudly on a non-lead_id index. - ``test_orchestrator_skips_bonus_probe_by_default`` / ``test_orchestrator_runs_bonus_probe_when_enabled`` — opt-in behaviour at the orchestrator level. * Existing clean-bundle tests no longer pass ``label`` to the structural orchestrator (it is unused without ``bonus_model_max_auc``); the bonus-enabled test path is exercised separately. 1007 tests pass (was 1003); ruff + mypy clean across leadforge/. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

shaypal5 · 2026-05-05T17:56:54Z

Self-review refactor pushed (`df15a24`)

I asked myself "what would a senior dev hate about this PR" and found enough to warrant a follow-up commit. Behaviour of the structural probes is preserved; the bonus-model probe goes from default-on to opt-in, which is the substantive change.

P0 — substantive fixes

1. Bonus-model probe is now opt-in. I had DEFAULT_MAX_BONUS_AUC = 0.65 baked into the orchestrator, but the brief explicitly framed this probe as "optional" with a "placeholder" threshold awaiting PR 3.3 calibration. Default-on would have meant every alpha bundle fails an arbitrarily-chosen ceiling the moment PR 2.2 wires the orchestrator into validate_bundle. Fixed by making bonus_model_max_auc: float | None = None the orchestrator default — the probe runs only when the caller passes a threshold.

2. _resolve_label would silently no-op on misaligned labels. If a caller passed a label Series not indexed by lead_id, reindex(features.index) produced all-NaN, then the binary-cardinality skip (y.nunique(dropna=True) < 2) returned no findings. A leakage validator that hides bugs by design is anti-purpose. Now raises ValueError if label.index.name != "lead_id".

3. RelationalLeakageError was inheriting from AssertionError — wrong base. AssertionError gets stripped under python -O, and the project already has LeadforgeError (leadforge/core/exceptions.py) as the standard domain-error root. Switched.

P1 — architecture / naming

4. Contract constants were in the wrong subpackage. BANNED_LEAD_COLUMNS / BANNED_OPP_COLUMNS / BANNED_TABLES / SNAPSHOT_FILTERED_TABLES describe what counts as leakage — that's the validator's job. Moved them from render/relational_snapshot_safe.py to validation/relational_leakage.py; the writer module now imports the contract. Fixes one render→validation dependency that was pointed the wrong way.

5. EVENT_TABLES → SNAPSHOT_FILTERED_TABLES. The original name implied "all event tables", but opportunities is an entity table I just happen to per-lead-filter by created_at. New name says what the constant is for.

6. CHANNEL_DETERMINISTIC_PATH → CHANNEL_JOIN_RECONSTRUCTION. Path A is a deterministic reconstruction too — it's just a column read instead of a join. The probe deliberately scopes to paths B/C/D and delegates Path A to probe_banned_columns; the channel constant now reflects that.

Tests added

test_bonus_model_probe_rejects_misaligned_label — _resolve_label guard fires loudly on a non-lead_id index.
test_orchestrator_skips_bonus_probe_by_default / test_orchestrator_runs_bonus_probe_when_enabled — opt-in semantics at the orchestrator level.

Verification

1007 tests pass (was 1003); ruff + mypy clean across leadforge/.

🤖 Generated with Claude Code

Two legitimate edge cases raised on PR #63: 1. Partial labels would crash with an opaque error. ``y_series.reindex(features.index).astype(int)`` raises ``IntCastingNaNError`` when the supplied ``label`` is missing any ``lead_id`` present in the bundle, because reindex introduces NaN and ``astype(int)`` chokes. Now the probe raises a clear ``ValueError`` that names the gap and tells the caller to either complete the label or omit it. 2. ``StratifiedKFold(n_splits=5)`` would crash on small or highly imbalanced bundles. When the smaller class has fewer than 5 members, sklearn raises ``ValueError`` from ``skf.split``. Now the probe sizes ``n_splits = min(5, min_class_count)`` and skips silently (no finding, no error) when even ``n_splits=2`` is impossible — that is a sample-size constraint, not a leakage finding. Two new tests cover the partial-label rejection, and two cover the dynamic-n_splits / too-small-to-CV paths. Finding messages now report the actual fold count (e.g. ``3-fold CV AUC...``) instead of a hard-coded ``5-fold``. 1010 tests pass (was 1007); ruff + mypy clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…an events Address two new Copilot review threads on PR #63 (both legitimate): * COPILOT-3 (relational_snapshot_safe.py:126) — ``_build_anchor`` parsed ``lead_created_at`` with ``pd.to_datetime`` but never validated the result. A NaT anchor would propagate through ``ts <= NaT`` -> NaN -> ``fillna(False)`` and silently drop every event for the affected lead. Now coerce to NaT explicitly with ``errors="coerce"`` and raise ``ValueError`` (with a 5-lead sample of offending lead_ids) if any NaT remains. Mirrors the same posture as the existing duplicate-lead_id check: structural defects fail loudly rather than producing a quietly wrong output. * COPILOT-4 (relational_leakage.py:382) — ``probe_snapshot_window`` had the same NaT-anchor issue, plus a related orphan-event blind spot: an event row whose ``lead_id`` is absent from ``leads`` gets a NaT cutoff after the left-merge, and the ``fillna(False)`` in the violation count silently treats it as a non-violation. An orphan event is a structural FK violation AND a leakage attack surface — a tampered bundle could insert post-snapshot events tied to lead_ids absent from the public leads table to bypass the snapshot filter. Both cases now raise ``ValueError`` with offending lead_id samples. Tests added: * ``test_nat_lead_created_at_raises`` / ``test_unparseable_lead_created_at_raises`` — render-side anchor. * ``test_snapshot_window_nat_lead_created_at_raises`` / ``test_snapshot_window_orphan_event_raises`` — validator-side. Did not refactor ``_build_anchor`` into a shared helper between the two modules; PR 2.1 keeps render and validation independent in their anchor handling, and PR 3.1 (which already plans to unify the leakage probes) can DRY if the duplication starts to matter. 1014 tests pass (was 1010); ruff + mypy clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

github-actions · 2026-05-05T18:26:45Z

pr-agent-context report:

This run includes unresolved review comments on PR #63 in repository https://github.com/leadforge-dev/leadforge

For each unresolved review comment, recommend one of: resolve as irrelevant, accept and implement
the recommended solution, open a separate issue and resolve as out-of-scope for this PR, accept and
implement a different solution, or resolve as already treated by the code.

After I reply with my decision per item, implement the accepted actions, resolve the corresponding
PR comments, and push all of these changes in a single commit.

# Copilot Comments

## COPILOT-1
Location: leadforge/render/relational_snapshot_safe.py
URL: https://github.com/leadforge-dev/leadforge/pull/63#discussion_r3190571467
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    _build_anchor() converts lead_created_at with pd.to_datetime() but doesn’t validate that all values are non-null / successfully parsed. If any lead has a missing created_at (NaT), downstream filtering will silently drop all events for that lead because comparisons against NaT become False, masking data-quality issues. Consider asserting no NaT after parsing (or using errors='raise' plus an explicit null check) and raising a ValueError that points to the offending lead_ids.

## COPILOT-2
Location: leadforge/validation/relational_leakage.py:408
URL: https://github.com/leadforge-dev/leadforge/pull/63#discussion_r3190571543
Root author: copilot-pull-request-reviewer

Comment:
    probe_snapshot_window() merges event tables to leads with how='left' and then counts ts > cutoff, but rows whose lead_id doesn’t exist in leads (or leads with null lead_created_at) result in NaT cutoffs and are silently treated as non-violations due to the fillna(False). This can let structurally invalid bundles pass the snapshot-window probe. Consider explicitly checking for missing anchors after the merge (e.g., merged['lead_created_at'].isna()) and raising a ValueError or emitting a dedicated finding for missing lead references / missing lead_created_at.

Run metadata:

Tool ref: v4
Tool version: 4.0.21
Trigger: commit pushed
Workflow run: 25394565987 attempt 1
Comment timestamp: 2026-05-05T18:25:54.737721+00:00
PR head commit: ce2b3d34a0e16be638d9854b74310566bca80203

Copilot AI review requested due to automatic review settings May 5, 2026 17:39

shaypal5 added this to the dataset: leadforge-lead-scoring-v1 milestone May 5, 2026

shaypal5 added type: feature New capability layer: render render/ bundle and artifact output layer: validation validation/ invariants and checks labels May 5, 2026

Copilot started reviewing on behalf of shaypal5 May 5, 2026 17:40 View session

This comment has been minimized.

Sign in to view

Copilot AI reviewed May 5, 2026

View reviewed changes

Comment thread leadforge/validation/relational_leakage.py

Comment thread leadforge/validation/relational_leakage.py

This comment has been minimized.

Sign in to view

Copilot AI review requested due to automatic review settings May 5, 2026 18:02

Copilot started reviewing on behalf of shaypal5 May 5, 2026 18:03 View session

This comment has been minimized.

Sign in to view

Copilot AI reviewed May 5, 2026

View reviewed changes

Comment thread leadforge/render/relational_snapshot_safe.py Outdated

Comment thread leadforge/validation/relational_leakage.py

shaypal5 merged commit 4392efc into main May 5, 2026
8 checks passed

shaypal5 deleted the feat/snapshot-safe-relational-export branch May 5, 2026 19:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(render,validation): snapshot-safe relational export + leakage validator (PR 2.1)#63

feat(render,validation): snapshot-safe relational export + leakage validator (PR 2.1)#63
shaypal5 merged 5 commits into
mainfrom
feat/snapshot-safe-relational-export

shaypal5 commented May 5, 2026

Uh oh!

This comment has been minimized.

This comment has been minimized.

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

shaypal5 commented May 5, 2026

Uh oh!

This comment has been minimized.

This comment has been minimized.

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shaypal5 commented May 5, 2026

Summary

What's in

What's NOT in (deferred to PR 2.2)

Acceptance (from the handoff)

Verification

References

Test plan

Uh oh!

This comment has been minimized.

This comment has been minimized.

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

shaypal5 commented May 5, 2026

Self-review refactor pushed (df15a24)

P0 — substantive fixes

P1 — architecture / naming

Tests added

Verification

Uh oh!

This comment has been minimized.

This comment has been minimized.

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Self-review refactor pushed (`df15a24`)