feat: Milestone v4-M1 — engine changes and build pipeline#21
Merged
Conversation
…ures Engine changes for v4-M1: 1. category_latent_correlations in difficulty_profiles.yaml (intro profile): Correlate seniority→authority, revenue_band→account_fit, lead_source→engagement_propensity. Validated by spike experiment (scale 1.8, AUC 0.694). 2. population.py: apply_category_latent_correlations() shifts latent traits based on observable categories after initial sampling. Default None preserves backward compatibility. 3. snapshots.py: snapshot_day parameter for windowed aggregation. New features: touches_week_1, days_since_first_touch, expected_acv, total_touches_all (leakage trap). Default None preserves existing behavior. 4. features.py: FeatureSpec entries for all new columns. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…t.py Add the v4 dataset build and validation scripts: - build_v4_snapshot.py: full pipeline (generate → day-14 snapshot → derive binary features → rename → stratified subsample → inject MAR missingness) - validate_v4_dataset.py: 7 mandatory checks (banned cols, deterministic groups, conversion rate, baseline AUC, leakage trap, missingness, shape) plus 2 warning checks (redundancy, low variance) Also includes the opportunity_created feature fix in snapshots.py (tracks ANY opportunity, not just open ones) and its FeatureSpec entry. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add tests/render/test_snapshot_windowed.py (16 tests): - Windowed snapshot basics (row count, counts ≤ full, None == default) - touches_week_1, days_since_first_touch, total_touches_all - opportunity_created, expected_acv - Windowed determinism under same seed Add 3 tests to tests/simulation/test_population.py: - Category-latent correlations shift target trait mean - Extreme boosts are clamped to [0, 1] - Deterministic under same seed + correlations Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Pull request overview
Implements v4-M1 engine updates to support a new “intro lead scoring” dataset shape, plus adds scripts to build and validate the v4 CSV end-to-end.
Changes:
- Adds category→latent correlation boosts (configured via recipe difficulty profiles) and wires them through
Generator.generate()into population generation. - Extends snapshot rendering with
snapshot_daywindowing and adds new snapshot features (momentum features,expected_acv, leakage trap,opportunity_created). - Introduces build/validation scripts and accompanying tests for the new snapshot semantics and correlation behavior.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
leadforge/simulation/population.py |
Adds optional category-latent correlation application during population build. |
leadforge/api/generator.py |
Loads correlations from recipe difficulty profiles and passes through to population build. |
leadforge/render/snapshots.py |
Adds snapshot_day event-windowing and computes new v4 snapshot features. |
leadforge/schema/features.py |
Extends the canonical feature spec with the new v4 features/leakage trap. |
leadforge/recipes/b2b_saas_procurement_v1/difficulty_profiles.yaml |
Adds category_latent_correlations config for the intro difficulty profile. |
scripts/build_v4_snapshot.py |
Adds an end-to-end pipeline to generate + snapshot + subsample + inject missingness into the v4 CSV. |
scripts/validate_v4_dataset.py |
Adds a v4 CSV validator with mandatory checks and warning checks. |
tests/render/test_snapshot_windowed.py |
New tests covering windowed snapshot behavior and new features. |
tests/simulation/test_population.py |
Adds tests for category-latent correlation shifting, clamping, and determinism. |
.agent-plan.md |
Updates milestone tracking/status to reflect v4-M1 progress. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This comment has been minimized.
This comment has been minimized.
COPILOT-1: validate all _FINAL_COLUMNS present in rename_and_select() COPILOT-2: narrow except to (FileNotFoundError, KeyError) in generator.py COPILOT-3: validate correlation spec shape in _apply_category_latent_correlations() COPILOT-4: add scikit-learn to [scripts] optional dependency COPILOT-5: always compute total_touches_all from full touch table COPILOT-6: update days_since_last_touch description to say "snapshot cutoff" COPILOT-7: fix build_v4_snapshot.py docstring (day-14, no bundle path arg) FAIL-1: fix check_determinism to ignore generation_timestamp in manifest Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
This was referenced Apr 29, 2026
1. Fix lead-source boost stacking: deduplicate by contact_id so a contact shared by N leads receives the boost exactly once, not N times. Add regression test. 2. Single source of truth for revenue band midpoints: define REVENUE_BAND_MIDPOINTS in population.py alongside _REVENUE_BANDS; import it in snapshots.py instead of duplicating. 3. Fix stale "day-21" docstring in build_v4_snapshot.py:78. 4. Fix determinism test: revert semantic manifest comparison; instead thread generation_timestamp through WorldBundle.save() and write_bundle() so the test fixture can pin it. Byte-level comparison is preserved. 5. Document snapshot_day cutoff semantics (midnight-exclusive by construction) in build_snapshot() docstring. 6. Remove sys.path.insert hack from build_v4_snapshot.py — package must be installed. Out-of-scope (issues opened): - #22: validation script uses train AUC - #23: add tests for build pipeline scripts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
pr-agent-context report: No unresolved review comments, failing checks, or actionable patch coverage gaps were found on PR
#21. Treat this PR as all clear unless new signals appear.Run metadata: |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+482
to
+486
| else: | ||
| # Lead-level fields (e.g. lead_source) — adjust linked contact latents. | ||
| # Deduplicate by contact_id: use the first lead's value to avoid | ||
| # stacking boosts when multiple leads share a contact. | ||
| seen_contacts: set[str] = set() |
Comment on lines
+286
to
+288
| # expected_acv: opportunity ACV where available, else revenue band midpoint. | ||
| band_midpoint = lead_df["estimated_revenue_band"].map(REVENUE_BAND_MIDPOINTS) | ||
| lead_df["expected_acv"] = lead_df["opportunity_estimated_acv"].fillna(band_midpoint) |
Comment on lines
+163
to
+194
| else: | ||
| # Check source-conditional ratio | ||
| outbound_rate = ( | ||
| df.loc[df["lead_source"] == "sdr_outbound", "web_sessions"].isna().mean() | ||
| ) | ||
| inbound_rate = ( | ||
| df.loc[df["lead_source"] == "inbound_marketing", "web_sessions"].isna().mean() | ||
| ) | ||
| if inbound_rate > 0 and outbound_rate / inbound_rate < 3.0: | ||
| errors.append( | ||
| f"web_sessions missing ratio outbound/inbound = " | ||
| f"{outbound_rate / inbound_rate:.1f}x (need >3x)" | ||
| ) | ||
| elif inbound_rate == 0 and outbound_rate > 0: | ||
| pass # Trivially satisfied | ||
| elif inbound_rate == 0 and outbound_rate == 0: | ||
| errors.append("web_sessions has no source-conditional missingness") | ||
|
|
||
| # seniority must have nulls | ||
| if "seniority" in df.columns: | ||
| if df["seniority"].isna().sum() == 0: | ||
| errors.append("seniority has no nulls") | ||
| else: | ||
| partner_rate = ( | ||
| df.loc[df["lead_source"] == "partner_referral", "seniority"].isna().mean() | ||
| ) | ||
| other_rate = df.loc[df["lead_source"] != "partner_referral", "seniority"].isna().mean() | ||
| if other_rate > 0 and partner_rate / other_rate < 3.0: | ||
| errors.append( | ||
| f"seniority missing ratio partner/other = " | ||
| f"{partner_rate / other_rate:.1f}x (need >3x)" | ||
| ) |
| any_opps = od[["lead_id"]].drop_duplicates() | ||
| any_opps["opportunity_created"] = True | ||
|
|
||
| open_opps = od[od["close_outcome"].isna()][["lead_id", "estimated_acv"]] |
Comment on lines
+174
to
+182
| category_latent_correlations = None | ||
| try: | ||
| raw = load_recipe(config.recipe_id) | ||
| recipe = Recipe.from_dict(raw) | ||
| profiles = recipe.load_difficulty_profiles() | ||
| profile = profiles.get(config.difficulty.value, {}) | ||
| category_latent_correlations = profile.get("category_latent_correlations") | ||
| except (FileNotFoundError, KeyError): | ||
| category_latent_correlations = None |
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Engine changes and build pipeline for the v4 lead scoring intro dataset.
Engine changes (backward-compatible — all existing tests still pass):
difficulty_profiles.yaml(intro profile, scale 1.8) — correlates observable categories (seniority, revenue_band, lead_source) with latent traits during population generation via_apply_category_latent_correlations()inpopulation.pysnapshot_dayparameter onbuild_snapshot()filters events to a per-lead window while the target still covers the full 90-day horizontouches_week_1,days_since_first_touch,expected_acv(opp ACV or revenue band midpoint),total_touches_all(leakage trap using full horizon)opportunity_createdfeature — tracks ANY opportunity (not just open ones), fixing a deterministic group wherehas_open_opportunity=1→ 0% conversionBuild pipeline:
scripts/build_v4_snapshot.py— full pipeline: generate → day-14 snapshot → derive binary features → rename → stratified subsample (1000 rows, ~30% conversion) → inject structured MAR missingnessscripts/validate_v4_dataset.py— 7 mandatory checks (banned columns, deterministic groups, conversion rate, baseline AUC, leakage trap, missingness structure, shape) + 2 warning checksValidated end-to-end: all 7 mandatory checks pass, LR AUC = 0.659, leakage trap boost ≥ 0.03.
Test plan
tests/render/test_snapshot_windowed.py(windowed basics, new features, determinism)tests/simulation/test_population.py(category-latent correlations: shift, clamp, determinism)🤖 Generated with Claude Code