fix: redact current_stage from student_public bundles#56
Merged
Conversation
Add an `is_leakage_trap` flag to FeatureSpec and an exposure-layer column-redaction pass that drops `leakage_risk and not is_leakage_trap` columns from student_public snapshots, task splits, feature dictionary, and dataset card. `current_stage` (true label leak) is now removed in student_public; `total_touches_all` (intentional pedagogical trap) is preserved via the new flag. - `schema/features.py`: add `is_leakage_trap` field; mark `total_touches_all`; export `STUDENT_PUBLIC_REDACTED_COLUMNS`. - `schema/dictionaries.py`: emit `is_leakage_trap` column in `feature_dictionary.csv`. - `exposure/filters.py`: `BundleFilter.redacted_columns`; populated for student_public. - `api/bundle.py`: drop redacted columns from snapshot before splits; thread the visible feature tuple to dataset card and dictionary writers. - `narrative/dataset_card.py`: accept a `features` arg so categories / leakage section reflect what is actually published. - `validation/bundle_checks.py`: new `_check_exposure_redaction()` enforces CLAUDE.md invariant #1 — redacted columns must not appear in the published task splits or dictionary. - `validation/invariants.py`: relax exposure-monotonicity from byte identity to subset-with-shared-content for `feature_dictionary.csv` and task splits, so redaction doesn't trip the check. Tests: extend `tests/schema/test_features.py` with five new assertions covering the trap flag, the redaction set, and `current_stage`'s role. All 880 existing tests still pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- `tests/exposure/test_redaction.py` (19 tests): bundle-level proof that `current_stage` is absent from every student_public task split, that `total_touches_all` (the pedagogical trap) is preserved, that the research_instructor bundle still contains both, that the student feature dictionary differs from the instructor's by exactly the redacted set, that shared columns hold identical values across modes, and that `validate_bundle()` flags a tampered student_public bundle whose task split still carries `current_stage`. - `scripts/build_public_release.py`: drop `_FLAT_CSV_DROP_COLS` — the bundle writer now redacts at source, so the flat CSV inherits it. - `release/README.md`, `release/HF_DATASET_CARD.md`: update column counts (33 features + 1 trap + 1 target in student_public; 34 + 1 + 1 in the instructor companion) and rewrite the leakage section to distinguish stripped columns from the deliberate trap. - `.agent-plan.md`: flip "Known issue: current_stage leakage" to resolved with a link to the redaction mechanism. Bundles regenerated; all four pass `leadforge validate`; `scripts/verify_hash_determinism.py` confirms 73/73 files identical across two consecutive builds. Full test suite: 899 passing. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Pull request overview
This PR fixes a label leakage issue in student_public bundles by introducing an exposure-layer redaction rule that removes leakage-risk columns (except explicitly marked pedagogical traps) from published task splits and their associated metadata (feature dictionary + dataset card).
Changes:
- Add
is_leakage_traptoFeatureSpecand deriveSTUDENT_PUBLIC_REDACTED_COLUMNSfrom the canonical feature spec. - Apply redaction during bundle writing (before task splits / hashes) and thread the “visible” feature list into the feature dictionary + dataset card.
- Strengthen validation/invariants and add end-to-end tests to ensure redacted columns never appear in
student_publicoutputs.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| leadforge/schema/features.py | Adds is_leakage_trap and derives the student-public redaction set from feature specs. |
| leadforge/schema/dictionaries.py | Emits is_leakage_trap in feature_dictionary.csv. |
| leadforge/exposure/filters.py | Extends BundleFilter with redacted_columns and wires student-public defaults. |
| leadforge/api/bundle.py | Drops redacted columns pre-write and passes visible features to dictionary + dataset card rendering. |
| leadforge/narrative/dataset_card.py | Accepts a features argument so the card reflects the published column set. |
| leadforge/validation/bundle_checks.py | Adds _check_exposure_redaction() and wires it into validate_bundle(). |
| leadforge/validation/invariants.py | Updates exposure monotonicity checks to allow subset semantics for redacted outputs. |
| scripts/build_public_release.py | Removes flat-CSV-specific dropping in favor of source redaction. |
| tests/exposure/test_redaction.py | Adds end-to-end tests covering student/instructor behavior, cross-mode consistency, and validation enforcement. |
| tests/schema/test_features.py | Adds schema-level assertions for traps, redaction set, and dictionary columns. |
| release/README.md | Updates leakage section and column-count documentation to match redaction behavior. |
| release/HF_DATASET_CARD.md | Updates dataset card leakage explanation and feature/column counts. |
| .agent-plan.md | Marks the current_stage leakage issue as resolved with the chosen approach. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| |---|---|---|---| | ||
| | Leads | 5,000 | 5,000 | 5,000 | | ||
| | Features | 35 | 35 | 35 | | ||
| | Features | 33 (+ 1 trap, + 1 target) | 33 (+ 1 trap, + 1 target) | 33 (+ 1 trap, + 1 target) | |
| | Contacts | 4,200 | 4,200 | 4,200 | | ||
| | Columns | 35 (34 features + 1 target) | 35 | 35 | | ||
| | Columns (student_public) | 34 (33 features + 1 target) | 34 | 34 | | ||
| | Columns (instructor) | 35 (34 features + 1 target, +`current_stage`) | -- | -- | |
| extra_in_student = set(s_df.index) - set(i_df.index) | ||
| if extra_in_student: | ||
| errors.append( | ||
| "feature_dictionary.csv: student has rows missing from instructor: " |
| extra_in_student = set(s_df.columns) - set(i_df.columns) | ||
| if extra_in_student: | ||
| errors.append( | ||
| f"Task {rel}: student has columns missing from instructor: " |
Comment on lines
+192
to
+196
| For ``student_public`` bundles, the columns listed in the bundle filter's | ||
| ``redacted_columns`` must not appear in any task split or in the feature | ||
| dictionary. This guards CLAUDE.md invariant #1 ("no post-snapshot | ||
| features in student_public exports") against accidental regressions in | ||
| the build pipeline. |
Addresses senior-dev review of the previous two commits:
1. **Single source of truth.** Replaced the awkward
`leakage_risk + is_leakage_trap` flag pair with a single prescriptive
field `FeatureSpec.redact_in_modes: frozenset[ExposureMode]`.
`leakage_risk` reverts to purely descriptive ("post-snapshot
correlated"). `current_stage` carries `redact_in_modes={student_public}`;
`total_touches_all` keeps `leakage_risk=True` with empty `redact_in_modes`
(deliberate trap). New `redacted_columns_for(mode, features=...)`
parameterizable function replaces the frozen module-level set.
2. **Validation independent of the writer.** `_check_exposure_redaction`
now derives the expected redaction set directly from `LEAD_SNAPSHOT_FEATURES`
via `redacted_columns_for`, *not* from `BundleFilter`. A bug in the
filter no longer agrees-with-itself past the validator. Added a real
regression test that writes a real student_public bundle, mutates a
parquet to reinsert `current_stage`, and asserts validation fails.
3. **Self-describing manifest.** `manifest.json` now records
`redacted_columns: [...]`. The validator cross-checks the manifest's
declared set against the feature-spec-derived expected set; a
second new test mutates the manifest to claim nothing was redacted
and asserts validation flags the disagreement.
4. **Stricter exposure monotonicity.** `check_exposure_monotonicity`
now asserts that `instructor_columns - student_columns` *equals*
`redacted_columns_for(student_public)` — not just "is a superset".
A future column drop in student_public outside the redaction set is
now caught.
5. **Reverted feature_dictionary.csv schema.** The previous commit added
an `is_leakage_trap` column to the published CSV without flagging the
schema change. Removed: the redaction policy is package-internal and
the bundle's actual published schema is observable from the parquet
files and `manifest.redacted_columns`. CSV stays back-compat with
prior releases (6 columns: name, dtype, description, category,
is_target, leakage_risk).
6. **De-duplicated documentation.** `release/README.md` no longer
maintains a per-category feature count table by hand — that data is
in the auto-generated `dataset_card.md`. Removed the duplicated
table; added explicit caveats covering known structural-leakage
issues (event aggregates over the label window, `is_mql` zero
variance, `is_sql=False` near-deterministic for non-conversion).
7. **Documented the structural follow-up.** `.agent-plan.md` now
includes a clearly-scoped "Follow-up: structural leakage" section
covering: (a) windowed-snapshot fix for event aggregates;
(b) measured `is_sql` leakage P(conv|is_sql=False) = 0.038/0.015/0.006
across tiers; (c) `is_mql` is constant True (zero variance). Suggests
filing a tracked issue (deferred — needs user confirmation per
shared-state-action rules).
Removed: `BundleFilter.redacted_columns` (replaced by direct calls to
`redacted_columns_for(mode)` from the bundle writer).
Tests: schema tests rewritten for `redact_in_modes` semantics; redaction
tests updated for the new function-based API; +2 manifest-redaction
tests; +1 real regression test (mutated parquet) replacing the previous
manifest-string-edit test. 902 tests passing (was 899).
Hash determinism verified: 73/73 files identical across two consecutive
builds. All four release bundles regenerated; `feature_dictionary.csv`
is now back to its pre-PR 6-column schema.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Contributor
Author
Self-review pass: 10 issues raised, 10 addressed in commit 66b96eeI took an outsider's view on the original two commits and found ten substantive problems. Concrete fixes per critique:
Verification after refactor:
|
|
pr-agent-context report: This run includes unresolved review comments on PR #56 in repository https://github.com/leadforge-dev/leadforge
For each unresolved review comment, recommend one of: resolve as irrelevant, accept and implement
the recommended solution, open a separate issue and resolve as out-of-scope for this PR, accept and
implement a different solution, or resolve as already treated by the code.
After I reply with my decision per item, implement the accepted actions, resolve the corresponding
PR comments, and push all of these changes in a single commit.
# Copilot Comments
## COPILOT-1
Location: release/HF_DATASET_CARD.md
URL: https://github.com/leadforge-dev/leadforge/pull/56#discussion_r3180311658
Status: outdated
Root author: copilot-pull-request-reviewer
Comment:
The summary table’s “Features” count appears inconsistent with the bundle’s actual published schema described elsewhere (e.g., `student_public` task splits are 34 columns total = 33 non-target columns + 1 target, where the 33 includes the `total_touches_all` trap). “33 (+ 1 trap, + 1 target)” reads like 35 columns. Please clarify whether the 33 includes the trap or update the numbers/labels to match the actual Parquet column counts.
## COPILOT-2
Location: release/README.md
URL: https://github.com/leadforge-dev/leadforge/pull/56#discussion_r3180311717
Status: outdated
Root author: copilot-pull-request-reviewer
Comment:
The “Columns (instructor)” cell is confusing/inconsistent: “35 (34 features + 1 target, +`current_stage`)” suggests `current_stage` is in addition to the 34 features, but it should be included in the 34 non-target columns if the total is 35. Please rephrase to avoid double-counting (e.g., “35 (34 features incl. `current_stage` + 1 target)”).
## COPILOT-3
Location: leadforge/validation/invariants.py:174
URL: https://github.com/leadforge-dev/leadforge/pull/56#discussion_r3180311750
Root author: copilot-pull-request-reviewer
Comment:
In `check_exposure_monotonicity()`, `extra_in_student = set(s_df.index) - set(i_df.index)` represents feature names that are present in the student dictionary but *not* in the instructor dictionary. The error text currently says “student has rows missing from instructor”, which reads backwards/ambiguous; it should say that student has rows *not present* in instructor (or similar) so the violation is clear.
## COPILOT-4
Location: leadforge/validation/invariants.py:265
URL: https://github.com/leadforge-dev/leadforge/pull/56#discussion_r3180311775
Root author: copilot-pull-request-reviewer
Comment:
`extra_in_student = set(s_df.columns) - set(i_df.columns)` means the student task split has columns that are *not present* in the instructor split. The emitted message says “student has columns missing from instructor”, which is misleading (it sounds like the inverse). Please reword to something like “student has extra columns not in instructor” to match the condition being checked.
## COPILOT-5
Location: leadforge/validation/bundle_checks.py
URL: https://github.com/leadforge-dev/leadforge/pull/56#discussion_r3180311801
Status: outdated
Root author: copilot-pull-request-reviewer
Comment:
The docstring for `_check_exposure_redaction()` says it enforces “no post-snapshot features in student_public exports”, but the code explicitly allows leakage-risk features marked as pedagogical traps (e.g. `total_touches_all`). Consider updating the wording to reflect the actual contract (e.g., “no leakage-risk columns except explicit traps”), otherwise the comment/invariant reference is misleading.Run metadata: |
11 tasks
shaypal5
added a commit
that referenced
this pull request
May 4, 2026
Addresses senior-dev review of #58. Six fixes: 1. **Close the relational-table bypass.** The previous commit redacted columns only from the snapshot/task splits. Users following the README's "Option 3" (feature engineering off `tables/leads.parquet`) could trivially rejoin redacted columns by `account_id`/`contact_id`. Apply `redacted_columns_for(mode)` uniformly to every published parquet under both `tables/` and `tasks/`. In student_public bundles, `tables/leads.parquet` no longer carries `current_stage` or `is_sql`. This also closes the equivalent bypass that PR #56 left open for `current_stage`. 2. **Fully remove `is_mql`.** Previously left as a half-measure: the field was dropped from the snapshot but kept on `LeadRow` and so still appeared as a constant-True column in `tables/leads.parquet`. Now removed from the entity dataclass, `DTYPE_MAP`, `population.py`, and `engine.py`. Gone from every artifact in every mode. 3. **Multi-seed measurement for the is_sql claim.** The PR description's single-seed numbers overstated the case ("99.4% deterministic"). Re-measured across 5 seeds on full-size bundles: P(conv | is_sql=False) = 0.061 ± 0.026 (intro) / 0.020 ± 0.010 (intermediate) / 0.011 ± 0.004 (advanced). Pattern holds, framing is now honest. CHANGELOG and release/README.md updated with the new numbers. 4. **Strengthen the zero-variance test.** The strict `nunique >= 2` guard caught `is_mql` but missed the weaker "rarest class is 1 row" pattern. For low-cardinality columns on bundles ≥ 200 rows, additionally assert the rarest class has ≥ 1% of rows (skipped on tiny test fixtures where the threshold would false-positive). 5. **Add an explicit v3 schema contract.** `tests/render/test_bundle_schema_v3_contract.py` pins the exact column set per mode for both `tasks/` and `tables/leads.parquet`. The duplication is intentional: any future change touching the feature spec, LeadRow, or redaction policy must update this file *and* bump `BUNDLE_SCHEMA_VERSION`. A bare schema-changing PR that doesn't touch this file fails here. 6. **Parameterize the realism boolean test.** `test_detects_non_boolean _feature` no longer hard-codes `opportunity_created`. It picks the first non-target boolean from `LEAD_SNAPSHOT_FEATURES` at test time so future column renames don't break it for unrelated reasons. Validation knock-on: `_check_stage_distribution` (realism.py) and the seed-drift `current_stage` collection (drift.py) now skip when the column is missing — consistent with the new redaction surface. `check_exposure_monotonicity` extended to allow legitimate column differences in `tables/leads.parquet` between modes (subset of the expected redaction set). Tests: - 911 passing (was 903; +8 v3 contract tests, +1 stronger zero-variance, -1 removed `is_mql` simulation test, others updated). - ruff + mypy clean. - All 4 release bundles regenerate and validate. - `verify_hash_determinism.py`: 73/73 files identical across builds. Per-bundle column counts after this commit: student_public: task=32, leads.parquet=9 research_instructor: task=34, leads.parquet=11 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
shaypal5
added a commit
that referenced
this pull request
May 4, 2026
* fix: redact is_sql, remove zero-variance is_mql (closes 2/3 of #57) Bundle schema v2 → v3. Addresses two of the three structural-leakage concerns surfaced in PR #56's self-review and tracked in issue #57. **`is_mql` removed from the canonical feature list.** Every lead is initialised at MQL stage in `simulation/population.py`, making the column constant `True` and zero-variance — no information for modelling. The `LeadRow.is_mql` field is retained on the relational `leads.parquet` table; only the snapshot/task-split column and feature-dictionary row are removed. Affects all exposure modes. **`is_sql` redacted in `student_public` mode.** Measured on the v2 bundles: P(converted | is_sql=False) ≈ 0.04 / 0.015 / 0.006 across intro / intermediate / advanced — at advanced tier essentially deterministic for the negative class. `is_sql` is added to `redact_in_modes={ExposureMode.student_public}`; the redaction machinery from PR #56 picks it up automatically. `is_sql` remains in `research_instructor` exports. **`BUNDLE_SCHEMA_VERSION` bumped to "3".** Removing `is_mql` changes the canonical feature set in every mode (not just a mode-conditional redaction), so a future external consumer pinning the v2 schema and checking for `is_mql` would silently break. Bumping is what the field is for. **New regression test.** `test_no_zero_variance_features` in `tests/exposure/test_redaction.py` reads each published student_public task split and asserts every non-ID, non-target column has at least two distinct values. Guards against the `is_mql` zero-variance regression and any future zero-variance feature slipping in. Bundle column counts (v3): - student_public/{intro,intermediate,advanced}: 32 (was 34) - research_instructor/intermediate_instructor: 34 (was 35) Verification: - 903 tests passing (was 902; +1 zero-variance regression test) - ruff + mypy clean - All 4 release bundles regenerated; all pass `validate_bundle()` - `verify_hash_determinism.py` reports 73/73 identical across builds Open follow-up: issue #57 sub-item 1 (event-aggregate features computed over the label window) — its own PR, will likely bump the schema again. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * refactor: close redaction bypass via tables/, fully remove is_mql Addresses senior-dev review of #58. Six fixes: 1. **Close the relational-table bypass.** The previous commit redacted columns only from the snapshot/task splits. Users following the README's "Option 3" (feature engineering off `tables/leads.parquet`) could trivially rejoin redacted columns by `account_id`/`contact_id`. Apply `redacted_columns_for(mode)` uniformly to every published parquet under both `tables/` and `tasks/`. In student_public bundles, `tables/leads.parquet` no longer carries `current_stage` or `is_sql`. This also closes the equivalent bypass that PR #56 left open for `current_stage`. 2. **Fully remove `is_mql`.** Previously left as a half-measure: the field was dropped from the snapshot but kept on `LeadRow` and so still appeared as a constant-True column in `tables/leads.parquet`. Now removed from the entity dataclass, `DTYPE_MAP`, `population.py`, and `engine.py`. Gone from every artifact in every mode. 3. **Multi-seed measurement for the is_sql claim.** The PR description's single-seed numbers overstated the case ("99.4% deterministic"). Re-measured across 5 seeds on full-size bundles: P(conv | is_sql=False) = 0.061 ± 0.026 (intro) / 0.020 ± 0.010 (intermediate) / 0.011 ± 0.004 (advanced). Pattern holds, framing is now honest. CHANGELOG and release/README.md updated with the new numbers. 4. **Strengthen the zero-variance test.** The strict `nunique >= 2` guard caught `is_mql` but missed the weaker "rarest class is 1 row" pattern. For low-cardinality columns on bundles ≥ 200 rows, additionally assert the rarest class has ≥ 1% of rows (skipped on tiny test fixtures where the threshold would false-positive). 5. **Add an explicit v3 schema contract.** `tests/render/test_bundle_schema_v3_contract.py` pins the exact column set per mode for both `tasks/` and `tables/leads.parquet`. The duplication is intentional: any future change touching the feature spec, LeadRow, or redaction policy must update this file *and* bump `BUNDLE_SCHEMA_VERSION`. A bare schema-changing PR that doesn't touch this file fails here. 6. **Parameterize the realism boolean test.** `test_detects_non_boolean _feature` no longer hard-codes `opportunity_created`. It picks the first non-target boolean from `LEAD_SNAPSHOT_FEATURES` at test time so future column renames don't break it for unrelated reasons. Validation knock-on: `_check_stage_distribution` (realism.py) and the seed-drift `current_stage` collection (drift.py) now skip when the column is missing — consistent with the new redaction surface. `check_exposure_monotonicity` extended to allow legitimate column differences in `tables/leads.parquet` between modes (subset of the expected redaction set). Tests: - 911 passing (was 903; +8 v3 contract tests, +1 stronger zero-variance, -1 removed `is_mql` simulation test, others updated). - ruff + mypy clean. - All 4 release bundles regenerate and validate. - `verify_hash_determinism.py`: 73/73 files identical across builds. Per-bundle column counts after this commit: student_public: task=32, leads.parquet=9 research_instructor: task=34, leads.parquet=11 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the
current_stageleakage that was surviving intostudent_publictask splits at the 90-day horizon (where the column held terminal stagesclosed_won/closed_lost, perfectly encoding the label). Closes the "Known issue" entry under Phase 5 of.agent-plan.md.Why this approach (option 2 — exposure-layer redaction)
I considered both options listed in
.agent-plan.md:lead_scoring_intro_v*.csv), not the full relationalrelease/bundle.leakage_risk and not is_leakage_trapcolumns instudent_publicmode. Smaller blast radius (only the leaky columns disappear), generalisable rule, naturally reuses the existingBundleFiltermachinery, and preserves the deliberately included pedagogical traptotal_touches_all. The simulation, the relational tables, and every non-leakage feature are unchanged.A new
is_leakage_trapfield onFeatureSpecdistinguishes the trap (total_touches_all, intentionally included for teaching) from a true leak (current_stage, the bug). The redaction set is derived from features at import time — any futureleakage_risk=Truecolumn withoutis_leakage_trap=Trueis automatically stripped fromstudent_publicexports.is_mql/is_sqlwere reviewed at the same time per the task brief: they are point-in-time achievement flags that correlate with the label but don't deterministically encode it (is_sql=Truefor many non-converted leads who reached SQL but didn't close). Their existingleakage_risk=Falseis correct; no change needed.Changes
Schema layer
schema/features.py: newis_leakage_trap: bool = FalseonFeatureSpec; setTrueontotal_touches_all; exportSTUDENT_PUBLIC_REDACTED_COLUMNS.schema/dictionaries.py: emitis_leakage_trapcolumn infeature_dictionary.csvso consumers can tell traps from leaks.Exposure layer
exposure/filters.py:BundleFilter.redacted_columns: frozenset[str]. Populated forstudent_publicfrom the feature spec; empty forresearch_instructor.Bundle assembly
api/bundle.py: drop redacted columns from the snapshot DataFrame before writing task splits, and pass the visible feature tuple to the dictionary writer and dataset card renderer. Manifest hashes therefore reflect the published column set.narrative/dataset_card.py: accept afeaturesarg so the categories / leakage section reflects what's actually published.Validation
validation/bundle_checks.py: new_check_exposure_redaction()readsexposure_modefrom the manifest and asserts that the redacted columns are not present in any task split or in the feature dictionary. Wires CLAUDE.md invariant docs: add mandatory branch/PR workflow rules #1 intovalidate_bundle()instead of relying on hope.validation/invariants.py: relaxcheck_exposure_monotonicity()from byte identity onfeature_dictionary.csvand task splits to subset-with-shared-content semantics (student rows ⊆ instructor rows; shared columns hold identical values).Build pipeline
scripts/build_public_release.py: drop_FLAT_CSV_DROP_COLS— the bundle writer now redacts at source, so the flat CSV inherits the redaction transitively.Documentation
release/README.mdandrelease/HF_DATASET_CARD.md: update column counts (33 + trap + target instudent_public; 34 + trap + target in instructor) and rewrite the leakage section to explain stripped vs. deliberately retained..agent-plan.md: flip the "Known issue: current_stage leakage" entry to resolved.Tests
tests/exposure/test_redaction.py(19 new tests):BundleFilter.redacted_columnsmatches the derived feature-spec set;research_instructorredacts nothing; the redaction set containscurrent_stageand excludestotal_touches_all.student_public): every task split lackscurrent_stage, every task split keepstotal_touches_all, no redacted column survives in any split, the target column is intact, the feature dictionary excludescurrent_stage, includes the trap, and has the expected row count, andvalidate_bundle()produces no redaction errors.research_instructor): every task split keepscurrent_stageandtotal_touches_all; the dictionary still has all 35 rows.student_publicis correctly flagged byvalidate_bundle().tests/schema/test_features.pyextended with five new assertions covering the trap flag, the redaction set, andcurrent_stage's role.Verification
pytest: 899 passing (was 880; +19 new redaction tests).ruff check/ruff format/mypy: clean.python scripts/build_public_release.py release: all 4 bundles generated and validated. Confirmed viapyarrow.parquet.read_schema:release/{intro,intermediate,advanced}/.../train.parquet→ 34 columns, nocurrent_stage,total_touches_allpresent.release/intermediate_instructor/.../train.parquet→ 35 columns includingcurrent_stage.python scripts/verify_hash_determinism.py: 73/73 files hash-identical across two consecutive builds. Determinism preserved.Out of scope
Per the task brief: uploading to Kaggle / HuggingFace and announcing the release (manual user steps, last in Phase 5).
Test plan
pytestpasses (899/899)ruff checkandruff format --checkcleanmypy leadforge/cleanscripts/build_public_release.pyregenerates all 4 bundles, all passvalidate_bundle()scripts/verify_hash_determinism.pyreports 73/73 identicalcurrent_stageis absent fromstudent_publictask splits and present inresearch_instructorfeature_dictionary.csvrow counts: 34 instudent_public(nocurrent_stage), 35 inresearch_instructor🤖 Generated with Claude Code