feat(render,validation): snapshot-safe relational export + leakage validator (PR 2.1)#63
Conversation
…lidator
PR 2.1 of the v1 dataset release sequence. Module-level only — adds
the structural fix audited against in PR 1.1, plus the matching
validator that asserts the fix is in place on any bundle claiming to
be student_public. PR 2.2 will wire these through bundle.py /
exposure/filters.py and bump BUNDLE_SCHEMA_VERSION 4 → 5.
leadforge/render/relational_snapshot_safe.py
to_dataframes_snapshot_safe(dfs, *, snapshot_day) projects the
full-horizon dict from leadforge.render.relational.to_dataframes onto
the public-bundle shape:
- leads: drops BANNED_LEAD_COLUMNS
(converted_within_90_days, conversion_timestamp).
- opportunities: drops BANNED_OPP_COLUMNS (close_outcome,
closed_at) and filters per-lead by
created_at <= lead_created_at + snapshot_day.
- touches/sessions/sales_activities: filtered per-lead on each
timestamp column to the same boundary
(defence-in-depth — alpha bundles already pass
G4.4 because the 90d horizon bounds events).
- customers/subscriptions: omitted entirely (BANNED_TABLES).
- accounts/contacts: pass-through.
Pure function, no I/O, deterministic. leadforge/render/relational.py
is unchanged (full-horizon export still backs research_instructor).
leadforge/validation/relational_leakage.py
Five probes feeding LeakageFinding / LeakageReport:
- probe_banned_columns — Path A column scan.
- probe_banned_tables — customers/subscriptions absent.
- probe_deterministic_reconstruction — paths B/C/D zero-hit on
the join graph.
- probe_snapshot_window — event timestamps within
lead_created_at + snapshot_day.
- probe_bonus_model_auc — 5-fold CV LR + HistGBM on
n_opps / max_acv / mean_acv;
default ceiling 0.65 with a
TODO(PR 3.3) for per-tier
calibration.
Two orchestrators (file-based and in-memory) plus
RelationalLeakageError carrying the LeakageReport. Constants
imported from the render module so writer + validator share one
source of truth for "snapshot-safe".
scripts/probe_relational_leakage.py
deterministic_relational_reconstruction lifted into the package
(where PR 3.1 was already slated to land it); the script now
re-exports it from leadforge.validation.relational_leakage. No
behavioural change — existing tests/scripts/test_probe_relational_leakage.py
still passes against `probe_module.deterministic_relational_reconstruction`.
Tests
tests/render/test_relational_snapshot_safe.py — 14 property tests:
banned cols absent; BANNED_TABLES absent from output dict; per-lead
snapshot-window invariant holds; accounts/contacts pass-through;
idempotent on already-safe input; deterministic across two calls;
no input mutation; canonical output table order; missing-optional
tolerance; snapshot_day=0 edge; negative-day raises; missing leads
raises; missing anchor cols raises.
tests/validation/test_relational_leakage.py — 23 probe tests over
a 50-lead synthetic bundle (clean + tampered): every probe silent
on the clean bundle (orchestrator + individual); each leakage
channel re-introduced one at a time fires the matching probe with
the expected detail; deterministic probe does not flag Path A
(covered by banned_columns); bonus-model fires when customers
re-introduced; bonus-model skips cleanly when label unavailable;
orchestrator aggregates findings across channels;
raise_if_failing carries the report; file-based run_all_probes
reads/exits cleanly on missing tables/leads; package and script
exports of deterministic_relational_reconstruction agree.
All 1001 existing tests still pass; ruff + mypy clean.
Refs: docs/release/v1_release_roadmap.md §"Phase 2";
docs/release/v1_release_design.md §"Snapshot-safe relational export";
docs/release/v1_current_state_audit.md §"Pointer to the structural fix"
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
Self-review followup to PR 2.1. Both ``_build_anchor`` (in ``relational_snapshot_safe``) and ``probe_snapshot_window`` (in ``relational_leakage``) build a per-lead anchor by selecting ``leads[['lead_id','lead_created_at']]`` and merging it back onto each event table. If ``leads.lead_id`` were not unique, the left-merge would broadcast and silently inflate event-table row counts (export) or produce a mask whose length doesn't match the event frame (probe). ``deterministic_relational_reconstruction`` already asserts the same invariant; align both new modules with it so a duplicate-key bug surfaces with a clear ValueError instead of a quiet wrong answer. Adds two regression tests (one per module) that pass duplicated leads and assert the matching ValueError. Tests now total 1003 passing; ruff + mypy still clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Pull request overview
This PR introduces the module-level building blocks for Phase 2 “snapshot-safe relational export”: a pure transformation that projects full-horizon relational tables into the student_public shape (removing known reconstruction channels), plus a matching leakage validator with multiple probes and orchestrators. It also consolidates the deterministic reconstruction logic into a single canonical implementation and adds focused test coverage around both the exporter and validator.
Changes:
- Added
to_dataframes_snapshot_safe(..., snapshot_day=...)to enforce the public relational table contract (drop banned columns/tables and filter event tables to the snapshot window). - Added
leadforge.validation.relational_leakagewith a structured findings/report model, five probes, and both in-memory and file-based orchestrators. - Updated the existing leakage probe script to re-export the now-canonical deterministic reconstruction function; added comprehensive tests for exporter and validator behavior.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
leadforge/render/relational_snapshot_safe.py |
Implements snapshot-safe projection (banned columns/tables + per-lead snapshot window filtering). |
leadforge/validation/relational_leakage.py |
Adds leakage probes, report types, orchestrators, and canonical deterministic reconstruction logic. |
scripts/probe_relational_leakage.py |
Re-exports deterministic reconstruction from the package to avoid duplicated logic. |
tests/render/test_relational_snapshot_safe.py |
Property tests for column/table dropping, snapshot filtering, idempotence, and determinism. |
tests/validation/test_relational_leakage.py |
Probe tests for clean vs tampered bundles plus orchestrator/report ergonomics. |
.agent-plan.md |
Updates roadmap checklist to mark Phase 2 module work complete in PR 2.1. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Self-review-driven cleanup of the structural/leakage modules from 9ebf571 / 2922eb8. Behaviour-preserving for the structural probes; the bonus-model probe goes from default-on to opt-in, which is the substantive correctness change. P0 — substantive fixes * Bonus-model probe is now opt-in. Orchestrators (``run_all_probes`` / ``run_all_probes_on_dataframes``) gain ``bonus_model_max_auc: float | None = None`` and skip the probe unless that threshold is set. The probe function itself now requires ``max_auc`` as a keyword (no ``DEFAULT_MAX_BONUS_AUC``). Rationale: PR 3.3 owns the calibrated per-tier band; PR 2.1 must not default-on an uncalibrated 0.65 ceiling that would fail every alpha bundle when PR 2.2 wires it into validate_bundle. * ``_resolve_label`` rejects misaligned labels with a clear ``ValueError`` instead of letting them silently no-op via the binary-cardinality skip — a leakage validator that hides bugs by design defeats its own purpose. * ``RelationalLeakageError`` now inherits from ``LeadforgeError`` instead of ``AssertionError`` (which gets stripped under ``python -O`` and is the wrong base class for a domain error). P1 — architecture / naming * Contract constants (``BANNED_LEAD_COLUMNS`` / ``BANNED_OPP_COLUMNS`` / ``BANNED_TABLES`` / ``SNAPSHOT_FILTERED_TABLES``) move from the render module to the validator. The validator owns the definition of "leakage"; the writer enforces it and imports the contract. This flips one render→validation import direction we had backwards. * ``EVENT_TABLES`` → ``SNAPSHOT_FILTERED_TABLES``. The original name implied "all event tables", but ``opportunities`` is an entity table that we per-lead-filter by ``created_at``; the new name says what the constant is for. * ``CHANNEL_DETERMINISTIC_PATH`` → ``CHANNEL_JOIN_RECONSTRUCTION``. Path A is a deterministic reconstruction too — it is just a column read. The probe deliberately scopes itself to join-graph paths (B/C/D) and delegates Path A to ``probe_banned_columns``; the channel constant now matches. Tests * Two new tests cover the new behaviours: - ``test_bonus_model_probe_rejects_misaligned_label`` — the ``_resolve_label`` guard fires loudly on a non-lead_id index. - ``test_orchestrator_skips_bonus_probe_by_default`` / ``test_orchestrator_runs_bonus_probe_when_enabled`` — opt-in behaviour at the orchestrator level. * Existing clean-bundle tests no longer pass ``label`` to the structural orchestrator (it is unused without ``bonus_model_max_auc``); the bonus-enabled test path is exercised separately. 1007 tests pass (was 1003); ruff + mypy clean across leadforge/. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Self-review refactor pushed (df15a24)I asked myself "what would a senior dev hate about this PR" and found enough to warrant a follow-up commit. Behaviour of the structural probes is preserved; the bonus-model probe goes from default-on to opt-in, which is the substantive change. P0 — substantive fixes1. Bonus-model probe is now opt-in. I had 2. 3. P1 — architecture / naming4. Contract constants were in the wrong subpackage. 5. 6. Tests added
Verification1007 tests pass (was 1003); ruff + mypy clean across 🤖 Generated with Claude Code |
This comment has been minimized.
This comment has been minimized.
Two legitimate edge cases raised on PR #63: 1. Partial labels would crash with an opaque error. ``y_series.reindex(features.index).astype(int)`` raises ``IntCastingNaNError`` when the supplied ``label`` is missing any ``lead_id`` present in the bundle, because reindex introduces NaN and ``astype(int)`` chokes. Now the probe raises a clear ``ValueError`` that names the gap and tells the caller to either complete the label or omit it. 2. ``StratifiedKFold(n_splits=5)`` would crash on small or highly imbalanced bundles. When the smaller class has fewer than 5 members, sklearn raises ``ValueError`` from ``skf.split``. Now the probe sizes ``n_splits = min(5, min_class_count)`` and skips silently (no finding, no error) when even ``n_splits=2`` is impossible — that is a sample-size constraint, not a leakage finding. Two new tests cover the partial-label rejection, and two cover the dynamic-n_splits / too-small-to-CV paths. Finding messages now report the actual fold count (e.g. ``3-fold CV AUC...``) instead of a hard-coded ``5-fold``. 1010 tests pass (was 1007); ruff + mypy clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…an events Address two new Copilot review threads on PR #63 (both legitimate): * COPILOT-3 (relational_snapshot_safe.py:126) — ``_build_anchor`` parsed ``lead_created_at`` with ``pd.to_datetime`` but never validated the result. A NaT anchor would propagate through ``ts <= NaT`` -> NaN -> ``fillna(False)`` and silently drop every event for the affected lead. Now coerce to NaT explicitly with ``errors="coerce"`` and raise ``ValueError`` (with a 5-lead sample of offending lead_ids) if any NaT remains. Mirrors the same posture as the existing duplicate-lead_id check: structural defects fail loudly rather than producing a quietly wrong output. * COPILOT-4 (relational_leakage.py:382) — ``probe_snapshot_window`` had the same NaT-anchor issue, plus a related orphan-event blind spot: an event row whose ``lead_id`` is absent from ``leads`` gets a NaT cutoff after the left-merge, and the ``fillna(False)`` in the violation count silently treats it as a non-violation. An orphan event is a structural FK violation AND a leakage attack surface — a tampered bundle could insert post-snapshot events tied to lead_ids absent from the public leads table to bypass the snapshot filter. Both cases now raise ``ValueError`` with offending lead_id samples. Tests added: * ``test_nat_lead_created_at_raises`` / ``test_unparseable_lead_created_at_raises`` — render-side anchor. * ``test_snapshot_window_nat_lead_created_at_raises`` / ``test_snapshot_window_orphan_event_raises`` — validator-side. Did not refactor ``_build_anchor`` into a shared helper between the two modules; PR 2.1 keeps render and validation independent in their anchor handling, and PR 3.1 (which already plans to unify the leakage probes) can DRY if the duplication starts to matter. 1014 tests pass (was 1010); ruff + mypy clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
pr-agent-context report: This run includes unresolved review comments on PR #63 in repository https://github.com/leadforge-dev/leadforge
For each unresolved review comment, recommend one of: resolve as irrelevant, accept and implement
the recommended solution, open a separate issue and resolve as out-of-scope for this PR, accept and
implement a different solution, or resolve as already treated by the code.
After I reply with my decision per item, implement the accepted actions, resolve the corresponding
PR comments, and push all of these changes in a single commit.
# Copilot Comments
## COPILOT-1
Location: leadforge/render/relational_snapshot_safe.py
URL: https://github.com/leadforge-dev/leadforge/pull/63#discussion_r3190571467
Status: outdated
Root author: copilot-pull-request-reviewer
Comment:
_build_anchor() converts lead_created_at with pd.to_datetime() but doesn’t validate that all values are non-null / successfully parsed. If any lead has a missing created_at (NaT), downstream filtering will silently drop all events for that lead because comparisons against NaT become False, masking data-quality issues. Consider asserting no NaT after parsing (or using errors='raise' plus an explicit null check) and raising a ValueError that points to the offending lead_ids.
## COPILOT-2
Location: leadforge/validation/relational_leakage.py:408
URL: https://github.com/leadforge-dev/leadforge/pull/63#discussion_r3190571543
Root author: copilot-pull-request-reviewer
Comment:
probe_snapshot_window() merges event tables to leads with how='left' and then counts ts > cutoff, but rows whose lead_id doesn’t exist in leads (or leads with null lead_created_at) result in NaT cutoffs and are silently treated as non-violations due to the fillna(False). This can let structurally invalid bundles pass the snapshot-window probe. Consider explicitly checking for missing anchors after the merge (e.g., merged['lead_created_at'].isna()) and raising a ValueError or emitting a dedicated finding for missing lead references / missing lead_created_at.Run metadata: |
Summary
PR 2.1 of the v1 dataset release sequence (roadmap §"Phase 2 — Snapshot-safe relational export"). Module-level only — adds the structural fix that the Phase 1 audit (PR #62) verified empirically against, plus the matching validator. PR 2.2 will wire these through the bundle writer and bump
BUNDLE_SCHEMA_VERSION4 → 5.What's in
leadforge/render/relational_snapshot_safe.py—to_dataframes_snapshot_safe(dfs, *, snapshot_day)projects the full-horizon dict onto the public-bundle shape: dropsBANNED_LEAD_COLUMNSfrom leads,BANNED_OPP_COLUMNSfrom opportunities, filters event tables (touches/sessions/sales_activities/opportunities) per-lead bylead_created_at + snapshot_day, omitscustomers/subscriptions, passes accounts/contacts unchanged. Pure function, no I/O.leadforge/render/relational.pyis untouched — the full-horizon export still backsresearch_instructor.leadforge/validation/relational_leakage.py— five probes (probe_banned_columns,probe_banned_tables,probe_deterministic_reconstruction,probe_snapshot_window,probe_bonus_model_auc) feedingLeakageFinding/LeakageReport/RelationalLeakageError. Two orchestrators:run_all_probes(file-based, what PR 2.2 will call frombundle_checks.validate_bundle) andrun_all_probes_on_dataframes(in-memory, for unit tests). Bonus-model band defaults to 0.65 with a# TODO(PR 3.3)for per-tier calibration. Constants imported from the render module so writer + validator share a single source of truth.scripts/probe_relational_leakage.py—deterministic_relational_reconstructionlifted into the package; the script re-exports it fromleadforge.validation.relational_leakage. The function is now defined once and shared by the alpha-bundle audit (PR 1.1) and the validator (this PR).tests/scripts/test_probe_relational_leakage.pyis unchanged and still passes.tests/render/test_relational_snapshot_safe.py(14 property tests) +tests/validation/test_relational_leakage.py(23 probe tests covering clean + tampered synthetic bundles, file-based orchestrator, ergonomics).What's NOT in (deferred to PR 2.2)
leadforge/exposure/filters.pyandleadforge/api/bundle.py.BUNDLE_SCHEMA_VERSION4 → 5 and therelational_snapshot_safemanifest field.leadforge/validation/bundle_checks.py.release/.Acceptance (from the handoff)
pytest tests/render/test_relational_snapshot_safe.py tests/validation/test_relational_leakage.pypasses (14 + 23 = 37 tests).to_dataframes_snapshot_safe, passes every probe.scripts/probe_relational_leakage.pyre-uses the lifted function rather than duplicating logic — the script and the package are one implementation. (Per the handoff sign-off, this PR also takes PR 3.1's lift-into-package work, since deferring it would have meant re-implementation here.)Verification
pytest: 1001 passed (37 new + 964 existing).ruff check+ruff format --check: clean.mypy leadforge/: no issues across 80 source files.References
docs/release/v1_release_roadmap.md§"Phase 2 — Snapshot-safe relational export"docs/release/v1_release_design.md§"Snapshot-safe relational export"docs/release/v1_current_state_audit.md§"Pointer to the structural fix — PR 2.1"Test plan
pytest tests/render/test_relational_snapshot_safe.py tests/validation/test_relational_leakage.py tests/scripts/test_probe_relational_leakage.py.pytest.ruff check . && ruff format --check ..mypy leadforge/.release/{intro,intermediate,advanced}produces an emptyLeakageReport.🤖 Generated with Claude Code