Skip to content

feat: Milestone v4-M1 — engine changes and build pipeline#21

Merged
shaypal5 merged 6 commits into
mainfrom
feat/v4-m1-engine-and-build-pipeline
Apr 29, 2026
Merged

feat: Milestone v4-M1 — engine changes and build pipeline#21
shaypal5 merged 6 commits into
mainfrom
feat/v4-m1-engine-and-build-pipeline

Conversation

@shaypal5

Copy link
Copy Markdown
Contributor

Summary

Engine changes and build pipeline for the v4 lead scoring intro dataset.

Engine changes (backward-compatible — all existing tests still pass):

  • Category-latent correlations in difficulty_profiles.yaml (intro profile, scale 1.8) — correlates observable categories (seniority, revenue_band, lead_source) with latent traits during population generation via _apply_category_latent_correlations() in population.py
  • Windowed snapshotssnapshot_day parameter on build_snapshot() filters events to a per-lead window while the target still covers the full 90-day horizon
  • New features: touches_week_1, days_since_first_touch, expected_acv (opp ACV or revenue band midpoint), total_touches_all (leakage trap using full horizon)
  • opportunity_created feature — tracks ANY opportunity (not just open ones), fixing a deterministic group where has_open_opportunity=1 → 0% conversion

Build pipeline:

  • scripts/build_v4_snapshot.py — full pipeline: generate → day-14 snapshot → derive binary features → rename → stratified subsample (1000 rows, ~30% conversion) → inject structured MAR missingness
  • scripts/validate_v4_dataset.py — 7 mandatory checks (banned columns, deterministic groups, conversion rate, baseline AUC, leakage trap, missingness structure, shape) + 2 warning checks

Validated end-to-end: all 7 mandatory checks pass, LR AUC = 0.659, leakage trap boost ≥ 0.03.

Test plan

  • 16 new tests in tests/render/test_snapshot_windowed.py (windowed basics, new features, determinism)
  • 3 new tests in tests/simulation/test_population.py (category-latent correlations: shift, clamp, determinism)
  • All 609 tests pass
  • CI green

🤖 Generated with Claude Code

shaypal5 and others added 4 commits April 29, 2026 16:17
…ures

Engine changes for v4-M1:

1. category_latent_correlations in difficulty_profiles.yaml (intro profile):
   Correlate seniority→authority, revenue_band→account_fit,
   lead_source→engagement_propensity. Validated by spike experiment
   (scale 1.8, AUC 0.694).

2. population.py: apply_category_latent_correlations() shifts latent
   traits based on observable categories after initial sampling.
   Default None preserves backward compatibility.

3. snapshots.py: snapshot_day parameter for windowed aggregation.
   New features: touches_week_1, days_since_first_touch, expected_acv,
   total_touches_all (leakage trap). Default None preserves existing
   behavior.

4. features.py: FeatureSpec entries for all new columns.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…t.py

Add the v4 dataset build and validation scripts:

- build_v4_snapshot.py: full pipeline (generate → day-14 snapshot → derive
  binary features → rename → stratified subsample → inject MAR missingness)
- validate_v4_dataset.py: 7 mandatory checks (banned cols, deterministic
  groups, conversion rate, baseline AUC, leakage trap, missingness, shape)
  plus 2 warning checks (redundancy, low variance)

Also includes the opportunity_created feature fix in snapshots.py (tracks ANY
opportunity, not just open ones) and its FeatureSpec entry.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add tests/render/test_snapshot_windowed.py (16 tests):
- Windowed snapshot basics (row count, counts ≤ full, None == default)
- touches_week_1, days_since_first_touch, total_touches_all
- opportunity_created, expected_acv
- Windowed determinism under same seed

Add 3 tests to tests/simulation/test_population.py:
- Category-latent correlations shift target trait mean
- Extreme boosts are clamped to [0, 1]
- Deterministic under same seed + correlations

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 29, 2026 13:45
@shaypal5 shaypal5 added type: feature New capability layer: schema schema/ entity/event contracts layer: simulation simulation/ discrete-time engine layer: render render/ bundle and artifact output layer: api api/ public Python surface layer: recipes recipes/ recipe assets and registry labels Apr 29, 2026
@github-actions

This comment has been minimized.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements v4-M1 engine updates to support a new “intro lead scoring” dataset shape, plus adds scripts to build and validate the v4 CSV end-to-end.

Changes:

  • Adds category→latent correlation boosts (configured via recipe difficulty profiles) and wires them through Generator.generate() into population generation.
  • Extends snapshot rendering with snapshot_day windowing and adds new snapshot features (momentum features, expected_acv, leakage trap, opportunity_created).
  • Introduces build/validation scripts and accompanying tests for the new snapshot semantics and correlation behavior.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
leadforge/simulation/population.py Adds optional category-latent correlation application during population build.
leadforge/api/generator.py Loads correlations from recipe difficulty profiles and passes through to population build.
leadforge/render/snapshots.py Adds snapshot_day event-windowing and computes new v4 snapshot features.
leadforge/schema/features.py Extends the canonical feature spec with the new v4 features/leakage trap.
leadforge/recipes/b2b_saas_procurement_v1/difficulty_profiles.yaml Adds category_latent_correlations config for the intro difficulty profile.
scripts/build_v4_snapshot.py Adds an end-to-end pipeline to generate + snapshot + subsample + inject missingness into the v4 CSV.
scripts/validate_v4_dataset.py Adds a v4 CSV validator with mandatory checks and warning checks.
tests/render/test_snapshot_windowed.py New tests covering windowed snapshot behavior and new features.
tests/simulation/test_population.py Adds tests for category-latent correlation shifting, clamping, and determinism.
.agent-plan.md Updates milestone tracking/status to reflect v4-M1 progress.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/build_v4_snapshot.py Outdated
Comment thread leadforge/api/generator.py Outdated
Comment thread leadforge/simulation/population.py
Comment thread scripts/validate_v4_dataset.py
Comment thread leadforge/render/snapshots.py Outdated
Comment thread leadforge/schema/features.py Outdated
Comment thread scripts/build_v4_snapshot.py Outdated
@github-actions

This comment has been minimized.

COPILOT-1: validate all _FINAL_COLUMNS present in rename_and_select()
COPILOT-2: narrow except to (FileNotFoundError, KeyError) in generator.py
COPILOT-3: validate correlation spec shape in _apply_category_latent_correlations()
COPILOT-4: add scikit-learn to [scripts] optional dependency
COPILOT-5: always compute total_touches_all from full touch table
COPILOT-6: update days_since_last_touch description to say "snapshot cutoff"
COPILOT-7: fix build_v4_snapshot.py docstring (day-14, no bundle path arg)
FAIL-1: fix check_determinism to ignore generation_timestamp in manifest

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions

This comment has been minimized.

1. Fix lead-source boost stacking: deduplicate by contact_id so a
   contact shared by N leads receives the boost exactly once, not N
   times. Add regression test.

2. Single source of truth for revenue band midpoints: define
   REVENUE_BAND_MIDPOINTS in population.py alongside _REVENUE_BANDS;
   import it in snapshots.py instead of duplicating.

3. Fix stale "day-21" docstring in build_v4_snapshot.py:78.

4. Fix determinism test: revert semantic manifest comparison; instead
   thread generation_timestamp through WorldBundle.save() and
   write_bundle() so the test fixture can pin it. Byte-level comparison
   is preserved.

5. Document snapshot_day cutoff semantics (midnight-exclusive by
   construction) in build_snapshot() docstring.

6. Remove sys.path.insert hack from build_v4_snapshot.py — package
   must be installed.

Out-of-scope (issues opened):
- #22: validation script uses train AUC
- #23: add tests for build pipeline scripts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 29, 2026 18:05
@github-actions

Copy link
Copy Markdown

pr-agent-context report:

No unresolved review comments, failing checks, or actionable patch coverage gaps were found on PR
#21. Treat this PR as all clear unless new signals appear.

Run metadata:

Tool ref: v4
Tool version: 4.0.20
Trigger: commit pushed
Workflow run: 25125594658 attempt 1
Comment timestamp: 2026-04-29T18:05:51.443800+00:00
PR head commit: 87ee4373108ee5ecec7ae9a161085479dc065802

@shaypal5 shaypal5 merged commit dd64605 into main Apr 29, 2026
7 checks passed
@shaypal5 shaypal5 deleted the feat/v4-m1-engine-and-build-pipeline branch April 29, 2026 18:07

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +482 to +486
else:
# Lead-level fields (e.g. lead_source) — adjust linked contact latents.
# Deduplicate by contact_id: use the first lead's value to avoid
# stacking boosts when multiple leads share a contact.
seen_contacts: set[str] = set()
Comment on lines +286 to +288
# expected_acv: opportunity ACV where available, else revenue band midpoint.
band_midpoint = lead_df["estimated_revenue_band"].map(REVENUE_BAND_MIDPOINTS)
lead_df["expected_acv"] = lead_df["opportunity_estimated_acv"].fillna(band_midpoint)
Comment on lines +163 to +194
else:
# Check source-conditional ratio
outbound_rate = (
df.loc[df["lead_source"] == "sdr_outbound", "web_sessions"].isna().mean()
)
inbound_rate = (
df.loc[df["lead_source"] == "inbound_marketing", "web_sessions"].isna().mean()
)
if inbound_rate > 0 and outbound_rate / inbound_rate < 3.0:
errors.append(
f"web_sessions missing ratio outbound/inbound = "
f"{outbound_rate / inbound_rate:.1f}x (need >3x)"
)
elif inbound_rate == 0 and outbound_rate > 0:
pass # Trivially satisfied
elif inbound_rate == 0 and outbound_rate == 0:
errors.append("web_sessions has no source-conditional missingness")

# seniority must have nulls
if "seniority" in df.columns:
if df["seniority"].isna().sum() == 0:
errors.append("seniority has no nulls")
else:
partner_rate = (
df.loc[df["lead_source"] == "partner_referral", "seniority"].isna().mean()
)
other_rate = df.loc[df["lead_source"] != "partner_referral", "seniority"].isna().mean()
if other_rate > 0 and partner_rate / other_rate < 3.0:
errors.append(
f"seniority missing ratio partner/other = "
f"{partner_rate / other_rate:.1f}x (need >3x)"
)
any_opps = od[["lead_id"]].drop_duplicates()
any_opps["opportunity_created"] = True

open_opps = od[od["close_outcome"].isna()][["lead_id", "estimated_acv"]]
Comment on lines +174 to +182
category_latent_correlations = None
try:
raw = load_recipe(config.recipe_id)
recipe = Recipe.from_dict(raw)
profiles = recipe.load_difficulty_profiles()
profile = profiles.get(config.difficulty.value, {})
category_latent_correlations = profile.get("category_latent_correlations")
except (FileNotFoundError, KeyError):
category_latent_correlations = None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

layer: api api/ public Python surface layer: recipes recipes/ recipe assets and registry layer: render render/ bundle and artifact output layer: schema schema/ entity/event contracts layer: simulation simulation/ discrete-time engine type: feature New capability

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants