Skip to content

feat(render,validation): snapshot-safe relational export + leakage validator (PR 2.1)#63

Merged
shaypal5 merged 5 commits into
mainfrom
feat/snapshot-safe-relational-export
May 5, 2026
Merged

feat(render,validation): snapshot-safe relational export + leakage validator (PR 2.1)#63
shaypal5 merged 5 commits into
mainfrom
feat/snapshot-safe-relational-export

Conversation

@shaypal5

@shaypal5 shaypal5 commented May 5, 2026

Copy link
Copy Markdown
Contributor

Summary

PR 2.1 of the v1 dataset release sequence (roadmap §"Phase 2 — Snapshot-safe relational export"). Module-level only — adds the structural fix that the Phase 1 audit (PR #62) verified empirically against, plus the matching validator. PR 2.2 will wire these through the bundle writer and bump BUNDLE_SCHEMA_VERSION 4 → 5.

What's in

  • leadforge/render/relational_snapshot_safe.pyto_dataframes_snapshot_safe(dfs, *, snapshot_day) projects the full-horizon dict onto the public-bundle shape: drops BANNED_LEAD_COLUMNS from leads, BANNED_OPP_COLUMNS from opportunities, filters event tables (touches/sessions/sales_activities/opportunities) per-lead by lead_created_at + snapshot_day, omits customers/subscriptions, passes accounts/contacts unchanged. Pure function, no I/O. leadforge/render/relational.py is untouched — the full-horizon export still backs research_instructor.
  • leadforge/validation/relational_leakage.py — five probes (probe_banned_columns, probe_banned_tables, probe_deterministic_reconstruction, probe_snapshot_window, probe_bonus_model_auc) feeding LeakageFinding/LeakageReport/RelationalLeakageError. Two orchestrators: run_all_probes (file-based, what PR 2.2 will call from bundle_checks.validate_bundle) and run_all_probes_on_dataframes (in-memory, for unit tests). Bonus-model band defaults to 0.65 with a # TODO(PR 3.3) for per-tier calibration. Constants imported from the render module so writer + validator share a single source of truth.
  • scripts/probe_relational_leakage.pydeterministic_relational_reconstruction lifted into the package; the script re-exports it from leadforge.validation.relational_leakage. The function is now defined once and shared by the alpha-bundle audit (PR 1.1) and the validator (this PR). tests/scripts/test_probe_relational_leakage.py is unchanged and still passes.
  • Teststests/render/test_relational_snapshot_safe.py (14 property tests) + tests/validation/test_relational_leakage.py (23 probe tests covering clean + tampered synthetic bundles, file-based orchestrator, ergonomics).

What's NOT in (deferred to PR 2.2)

  • Wiring through leadforge/exposure/filters.py and leadforge/api/bundle.py.
  • BUNDLE_SCHEMA_VERSION 4 → 5 and the relational_snapshot_safe manifest field.
  • Plumbing the validator into leadforge/validation/bundle_checks.py.
  • Regenerating the alpha bundles under release/.

Acceptance (from the handoff)

  • pytest tests/render/test_relational_snapshot_safe.py tests/validation/test_relational_leakage.py passes (14 + 23 = 37 tests).
  • The validator, run against a synthetic public bundle that does contain the four leakage channels, fails with a structured finding per channel.
  • The validator, run against a public bundle produced by to_dataframes_snapshot_safe, passes every probe.
  • scripts/probe_relational_leakage.py re-uses the lifted function rather than duplicating logic — the script and the package are one implementation. (Per the handoff sign-off, this PR also takes PR 3.1's lift-into-package work, since deferring it would have meant re-implementation here.)

Verification

  • pytest: 1001 passed (37 new + 964 existing).
  • ruff check + ruff format --check: clean.
  • mypy leadforge/: no issues across 80 source files.

References

  • docs/release/v1_release_roadmap.md §"Phase 2 — Snapshot-safe relational export"
  • docs/release/v1_release_design.md §"Snapshot-safe relational export"
  • docs/release/v1_current_state_audit.md §"Pointer to the structural fix — PR 2.1"
  • Phase 1 audit: docs: PR 1.1 — relational leakage audit on alpha bundles #62

Test plan

  • Targeted: pytest tests/render/test_relational_snapshot_safe.py tests/validation/test_relational_leakage.py tests/scripts/test_probe_relational_leakage.py.
  • Full suite: pytest.
  • Lint: ruff check . && ruff format --check ..
  • Types: mypy leadforge/.
  • PR 2.2 will additionally regenerate the alpha bundles and confirm that re-running the now-wired validator on release/{intro,intermediate,advanced} produces an empty LeakageReport.

🤖 Generated with Claude Code

…lidator

PR 2.1 of the v1 dataset release sequence.  Module-level only — adds
the structural fix audited against in PR 1.1, plus the matching
validator that asserts the fix is in place on any bundle claiming to
be student_public.  PR 2.2 will wire these through bundle.py /
exposure/filters.py and bump BUNDLE_SCHEMA_VERSION 4 → 5.

leadforge/render/relational_snapshot_safe.py
  to_dataframes_snapshot_safe(dfs, *, snapshot_day) projects the
  full-horizon dict from leadforge.render.relational.to_dataframes onto
  the public-bundle shape:
    - leads:          drops BANNED_LEAD_COLUMNS
                      (converted_within_90_days, conversion_timestamp).
    - opportunities:  drops BANNED_OPP_COLUMNS (close_outcome,
                      closed_at) and filters per-lead by
                      created_at <= lead_created_at + snapshot_day.
    - touches/sessions/sales_activities: filtered per-lead on each
                      timestamp column to the same boundary
                      (defence-in-depth — alpha bundles already pass
                      G4.4 because the 90d horizon bounds events).
    - customers/subscriptions: omitted entirely (BANNED_TABLES).
    - accounts/contacts: pass-through.
  Pure function, no I/O, deterministic.  leadforge/render/relational.py
  is unchanged (full-horizon export still backs research_instructor).

leadforge/validation/relational_leakage.py
  Five probes feeding LeakageFinding / LeakageReport:
    - probe_banned_columns          — Path A column scan.
    - probe_banned_tables           — customers/subscriptions absent.
    - probe_deterministic_reconstruction — paths B/C/D zero-hit on
                                      the join graph.
    - probe_snapshot_window         — event timestamps within
                                      lead_created_at + snapshot_day.
    - probe_bonus_model_auc         — 5-fold CV LR + HistGBM on
                                      n_opps / max_acv / mean_acv;
                                      default ceiling 0.65 with a
                                      TODO(PR 3.3) for per-tier
                                      calibration.
  Two orchestrators (file-based and in-memory) plus
  RelationalLeakageError carrying the LeakageReport.  Constants
  imported from the render module so writer + validator share one
  source of truth for "snapshot-safe".

scripts/probe_relational_leakage.py
  deterministic_relational_reconstruction lifted into the package
  (where PR 3.1 was already slated to land it); the script now
  re-exports it from leadforge.validation.relational_leakage.  No
  behavioural change — existing tests/scripts/test_probe_relational_leakage.py
  still passes against `probe_module.deterministic_relational_reconstruction`.

Tests
  tests/render/test_relational_snapshot_safe.py — 14 property tests:
  banned cols absent; BANNED_TABLES absent from output dict; per-lead
  snapshot-window invariant holds; accounts/contacts pass-through;
  idempotent on already-safe input; deterministic across two calls;
  no input mutation; canonical output table order; missing-optional
  tolerance; snapshot_day=0 edge; negative-day raises; missing leads
  raises; missing anchor cols raises.

  tests/validation/test_relational_leakage.py — 23 probe tests over
  a 50-lead synthetic bundle (clean + tampered): every probe silent
  on the clean bundle (orchestrator + individual); each leakage
  channel re-introduced one at a time fires the matching probe with
  the expected detail; deterministic probe does not flag Path A
  (covered by banned_columns); bonus-model fires when customers
  re-introduced; bonus-model skips cleanly when label unavailable;
  orchestrator aggregates findings across channels;
  raise_if_failing carries the report; file-based run_all_probes
  reads/exits cleanly on missing tables/leads; package and script
  exports of deterministic_relational_reconstruction agree.

All 1001 existing tests still pass; ruff + mypy clean.

Refs: docs/release/v1_release_roadmap.md §"Phase 2";
      docs/release/v1_release_design.md §"Snapshot-safe relational export";
      docs/release/v1_current_state_audit.md §"Pointer to the structural fix"

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 5, 2026 17:39
@shaypal5 shaypal5 added type: feature New capability layer: render render/ bundle and artifact output layer: validation validation/ invariants and checks labels May 5, 2026
@github-actions

This comment has been minimized.

Self-review followup to PR 2.1.  Both ``_build_anchor`` (in
``relational_snapshot_safe``) and ``probe_snapshot_window`` (in
``relational_leakage``) build a per-lead anchor by selecting
``leads[['lead_id','lead_created_at']]`` and merging it back onto each
event table.  If ``leads.lead_id`` were not unique, the left-merge
would broadcast and silently inflate event-table row counts (export)
or produce a mask whose length doesn't match the event frame
(probe).  ``deterministic_relational_reconstruction`` already
asserts the same invariant; align both new modules with it so a
duplicate-key bug surfaces with a clear ValueError instead of a
quiet wrong answer.

Adds two regression tests (one per module) that pass duplicated
leads and assert the matching ValueError.  Tests now total 1003
passing; ruff + mypy still clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions

This comment has been minimized.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces the module-level building blocks for Phase 2 “snapshot-safe relational export”: a pure transformation that projects full-horizon relational tables into the student_public shape (removing known reconstruction channels), plus a matching leakage validator with multiple probes and orchestrators. It also consolidates the deterministic reconstruction logic into a single canonical implementation and adds focused test coverage around both the exporter and validator.

Changes:

  • Added to_dataframes_snapshot_safe(..., snapshot_day=...) to enforce the public relational table contract (drop banned columns/tables and filter event tables to the snapshot window).
  • Added leadforge.validation.relational_leakage with a structured findings/report model, five probes, and both in-memory and file-based orchestrators.
  • Updated the existing leakage probe script to re-export the now-canonical deterministic reconstruction function; added comprehensive tests for exporter and validator behavior.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
leadforge/render/relational_snapshot_safe.py Implements snapshot-safe projection (banned columns/tables + per-lead snapshot window filtering).
leadforge/validation/relational_leakage.py Adds leakage probes, report types, orchestrators, and canonical deterministic reconstruction logic.
scripts/probe_relational_leakage.py Re-exports deterministic reconstruction from the package to avoid duplicated logic.
tests/render/test_relational_snapshot_safe.py Property tests for column/table dropping, snapshot filtering, idempotence, and determinism.
tests/validation/test_relational_leakage.py Probe tests for clean vs tampered bundles plus orchestrator/report ergonomics.
.agent-plan.md Updates roadmap checklist to mark Phase 2 module work complete in PR 2.1.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread leadforge/validation/relational_leakage.py
Comment thread leadforge/validation/relational_leakage.py
Self-review-driven cleanup of the structural/leakage modules from
9ebf571 / 2922eb8.  Behaviour-preserving for the structural probes;
the bonus-model probe goes from default-on to opt-in, which is the
substantive correctness change.

P0 — substantive fixes
  * Bonus-model probe is now opt-in.  Orchestrators
    (``run_all_probes`` / ``run_all_probes_on_dataframes``) gain
    ``bonus_model_max_auc: float | None = None`` and skip the probe
    unless that threshold is set.  The probe function itself now
    requires ``max_auc`` as a keyword (no ``DEFAULT_MAX_BONUS_AUC``).
    Rationale: PR 3.3 owns the calibrated per-tier band; PR 2.1
    must not default-on an uncalibrated 0.65 ceiling that would
    fail every alpha bundle when PR 2.2 wires it into validate_bundle.
  * ``_resolve_label`` rejects misaligned labels with a clear
    ``ValueError`` instead of letting them silently no-op via the
    binary-cardinality skip — a leakage validator that hides bugs
    by design defeats its own purpose.
  * ``RelationalLeakageError`` now inherits from ``LeadforgeError``
    instead of ``AssertionError`` (which gets stripped under
    ``python -O`` and is the wrong base class for a domain error).

P1 — architecture / naming
  * Contract constants (``BANNED_LEAD_COLUMNS`` /
    ``BANNED_OPP_COLUMNS`` / ``BANNED_TABLES`` /
    ``SNAPSHOT_FILTERED_TABLES``) move from the render module to the
    validator.  The validator owns the definition of "leakage";
    the writer enforces it and imports the contract.  This flips
    one render→validation import direction we had backwards.
  * ``EVENT_TABLES`` → ``SNAPSHOT_FILTERED_TABLES``.  The original
    name implied "all event tables", but ``opportunities`` is an
    entity table that we per-lead-filter by ``created_at``; the new
    name says what the constant is for.
  * ``CHANNEL_DETERMINISTIC_PATH`` → ``CHANNEL_JOIN_RECONSTRUCTION``.
    Path A is a deterministic reconstruction too — it is just a
    column read.  The probe deliberately scopes itself to
    join-graph paths (B/C/D) and delegates Path A to
    ``probe_banned_columns``; the channel constant now matches.

Tests
  * Two new tests cover the new behaviours:
      - ``test_bonus_model_probe_rejects_misaligned_label`` — the
        ``_resolve_label`` guard fires loudly on a non-lead_id index.
      - ``test_orchestrator_skips_bonus_probe_by_default`` /
        ``test_orchestrator_runs_bonus_probe_when_enabled`` —
        opt-in behaviour at the orchestrator level.
  * Existing clean-bundle tests no longer pass ``label`` to the
    structural orchestrator (it is unused without
    ``bonus_model_max_auc``); the bonus-enabled test path is
    exercised separately.

1007 tests pass (was 1003); ruff + mypy clean across leadforge/.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@shaypal5

shaypal5 commented May 5, 2026

Copy link
Copy Markdown
Contributor Author

Self-review refactor pushed (df15a24)

I asked myself "what would a senior dev hate about this PR" and found enough to warrant a follow-up commit. Behaviour of the structural probes is preserved; the bonus-model probe goes from default-on to opt-in, which is the substantive change.

P0 — substantive fixes

1. Bonus-model probe is now opt-in. I had DEFAULT_MAX_BONUS_AUC = 0.65 baked into the orchestrator, but the brief explicitly framed this probe as "optional" with a "placeholder" threshold awaiting PR 3.3 calibration. Default-on would have meant every alpha bundle fails an arbitrarily-chosen ceiling the moment PR 2.2 wires the orchestrator into validate_bundle. Fixed by making bonus_model_max_auc: float | None = None the orchestrator default — the probe runs only when the caller passes a threshold.

2. _resolve_label would silently no-op on misaligned labels. If a caller passed a label Series not indexed by lead_id, reindex(features.index) produced all-NaN, then the binary-cardinality skip (y.nunique(dropna=True) < 2) returned no findings. A leakage validator that hides bugs by design is anti-purpose. Now raises ValueError if label.index.name != "lead_id".

3. RelationalLeakageError was inheriting from AssertionError — wrong base. AssertionError gets stripped under python -O, and the project already has LeadforgeError (leadforge/core/exceptions.py) as the standard domain-error root. Switched.

P1 — architecture / naming

4. Contract constants were in the wrong subpackage. BANNED_LEAD_COLUMNS / BANNED_OPP_COLUMNS / BANNED_TABLES / SNAPSHOT_FILTERED_TABLES describe what counts as leakage — that's the validator's job. Moved them from render/relational_snapshot_safe.py to validation/relational_leakage.py; the writer module now imports the contract. Fixes one render→validation dependency that was pointed the wrong way.

5. EVENT_TABLESSNAPSHOT_FILTERED_TABLES. The original name implied "all event tables", but opportunities is an entity table I just happen to per-lead-filter by created_at. New name says what the constant is for.

6. CHANNEL_DETERMINISTIC_PATHCHANNEL_JOIN_RECONSTRUCTION. Path A is a deterministic reconstruction too — it's just a column read instead of a join. The probe deliberately scopes to paths B/C/D and delegates Path A to probe_banned_columns; the channel constant now reflects that.

Tests added

  • test_bonus_model_probe_rejects_misaligned_label_resolve_label guard fires loudly on a non-lead_id index.
  • test_orchestrator_skips_bonus_probe_by_default / test_orchestrator_runs_bonus_probe_when_enabled — opt-in semantics at the orchestrator level.

Verification

1007 tests pass (was 1003); ruff + mypy clean across leadforge/.

🤖 Generated with Claude Code

@github-actions

This comment has been minimized.

Two legitimate edge cases raised on PR #63:

1. Partial labels would crash with an opaque error.
   ``y_series.reindex(features.index).astype(int)`` raises
   ``IntCastingNaNError`` when the supplied ``label`` is missing
   any ``lead_id`` present in the bundle, because reindex
   introduces NaN and ``astype(int)`` chokes.  Now the probe
   raises a clear ``ValueError`` that names the gap and tells the
   caller to either complete the label or omit it.

2. ``StratifiedKFold(n_splits=5)`` would crash on small or highly
   imbalanced bundles.  When the smaller class has fewer than 5
   members, sklearn raises ``ValueError`` from ``skf.split``.
   Now the probe sizes ``n_splits = min(5, min_class_count)``
   and skips silently (no finding, no error) when even
   ``n_splits=2`` is impossible — that is a sample-size
   constraint, not a leakage finding.

Two new tests cover the partial-label rejection, and two cover
the dynamic-n_splits / too-small-to-CV paths.  Finding messages
now report the actual fold count (e.g. ``3-fold CV AUC...``)
instead of a hard-coded ``5-fold``.

1010 tests pass (was 1007); ruff + mypy clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 5, 2026 18:02
@github-actions

This comment has been minimized.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread leadforge/render/relational_snapshot_safe.py Outdated
Comment thread leadforge/validation/relational_leakage.py
…an events

Address two new Copilot review threads on PR #63 (both legitimate):

* COPILOT-3 (relational_snapshot_safe.py:126) — ``_build_anchor``
  parsed ``lead_created_at`` with ``pd.to_datetime`` but never
  validated the result.  A NaT anchor would propagate through
  ``ts <= NaT`` -> NaN -> ``fillna(False)`` and silently drop every
  event for the affected lead.  Now coerce to NaT explicitly with
  ``errors="coerce"`` and raise ``ValueError`` (with a 5-lead
  sample of offending lead_ids) if any NaT remains.  Mirrors the
  same posture as the existing duplicate-lead_id check: structural
  defects fail loudly rather than producing a quietly wrong
  output.

* COPILOT-4 (relational_leakage.py:382) — ``probe_snapshot_window``
  had the same NaT-anchor issue, plus a related orphan-event
  blind spot: an event row whose ``lead_id`` is absent from
  ``leads`` gets a NaT cutoff after the left-merge, and the
  ``fillna(False)`` in the violation count silently treats it as
  a non-violation.  An orphan event is a structural FK violation
  AND a leakage attack surface — a tampered bundle could insert
  post-snapshot events tied to lead_ids absent from the public
  leads table to bypass the snapshot filter.  Both cases now
  raise ``ValueError`` with offending lead_id samples.

Tests added:
* ``test_nat_lead_created_at_raises`` /
  ``test_unparseable_lead_created_at_raises`` — render-side anchor.
* ``test_snapshot_window_nat_lead_created_at_raises`` /
  ``test_snapshot_window_orphan_event_raises`` — validator-side.

Did not refactor ``_build_anchor`` into a shared helper between
the two modules; PR 2.1 keeps render and validation independent
in their anchor handling, and PR 3.1 (which already plans to
unify the leakage probes) can DRY if the duplication starts to
matter.

1014 tests pass (was 1010); ruff + mypy clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions

github-actions Bot commented May 5, 2026

Copy link
Copy Markdown

pr-agent-context report:

This run includes unresolved review comments on PR #63 in repository https://github.com/leadforge-dev/leadforge

For each unresolved review comment, recommend one of: resolve as irrelevant, accept and implement
the recommended solution, open a separate issue and resolve as out-of-scope for this PR, accept and
implement a different solution, or resolve as already treated by the code.

After I reply with my decision per item, implement the accepted actions, resolve the corresponding
PR comments, and push all of these changes in a single commit.

# Copilot Comments

## COPILOT-1
Location: leadforge/render/relational_snapshot_safe.py
URL: https://github.com/leadforge-dev/leadforge/pull/63#discussion_r3190571467
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    _build_anchor() converts lead_created_at with pd.to_datetime() but doesn’t validate that all values are non-null / successfully parsed. If any lead has a missing created_at (NaT), downstream filtering will silently drop all events for that lead because comparisons against NaT become False, masking data-quality issues. Consider asserting no NaT after parsing (or using errors='raise' plus an explicit null check) and raising a ValueError that points to the offending lead_ids.

## COPILOT-2
Location: leadforge/validation/relational_leakage.py:408
URL: https://github.com/leadforge-dev/leadforge/pull/63#discussion_r3190571543
Root author: copilot-pull-request-reviewer

Comment:
    probe_snapshot_window() merges event tables to leads with how='left' and then counts ts > cutoff, but rows whose lead_id doesn’t exist in leads (or leads with null lead_created_at) result in NaT cutoffs and are silently treated as non-violations due to the fillna(False). This can let structurally invalid bundles pass the snapshot-window probe. Consider explicitly checking for missing anchors after the merge (e.g., merged['lead_created_at'].isna()) and raising a ValueError or emitting a dedicated finding for missing lead references / missing lead_created_at.

Run metadata:

Tool ref: v4
Tool version: 4.0.21
Trigger: commit pushed
Workflow run: 25394565987 attempt 1
Comment timestamp: 2026-05-05T18:25:54.737721+00:00
PR head commit: ce2b3d34a0e16be638d9854b74310566bca80203

@shaypal5 shaypal5 merged commit 4392efc into main May 5, 2026
8 checks passed
@shaypal5 shaypal5 deleted the feat/snapshot-safe-relational-export branch May 5, 2026 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

layer: render render/ bundle and artifact output layer: validation validation/ invariants and checks type: feature New capability

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants