feat: v5 lead scoring dataset — robust validation, value-aware scoring#25
Conversation
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Pull request overview
This PR adds a v5 lead scoring “intro” dataset pipeline (build, validate, baseline eval) with expanded validation checks and a value-aware scoring demonstration, without changing the simulation engine.
Changes:
- Add a v5 dataset builder script that generates a day-14 snapshot, adds the new momentum feature, caps ACV to the narrative range, and outputs a 1000×19 CSV.
- Add a v5 validator script with 10 checks including hold-out AUC/PR-AUC, multi-seed leakage-trap robustness, missingness/duplicates/shape, leakage naming, and ACV range.
- Add a quick baseline evaluation script comparing LR vs RF, showing leakage-trap delta, feature importance, and a value-aware ranking demo.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| scripts/build_v5_snapshot.py | Generates the v5 CSV from an internal bundle + day-14 snapshot with ACV capping, missingness injection, and v5 column contract. |
| scripts/validate_v5_dataset.py | Implements the v5 validation spec (10 checks) including hold-out metrics and multi-seed leakage robustness. |
| scripts/quick_baseline_eval_v5.py | Provides a runnable baseline evaluation + leakage comparison + feature importance + value-aware scoring example. |
| .agent-plan.md | Updates internal project tracking notes to reflect v5 dataset work completion. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This comment has been minimized.
This comment has been minimized.
…ripts v5 improvements over v4: - Snapshot day 10 → 14 (longer observation window, more realistic) - Leakage trap renamed to __leakage__total_touches_90d (explicit naming) - expected_acv clipped to narrative range [18k, 120k] - Added days_since_first_touch momentum feature (19 cols, up from 18) - Validator uses hold-out AUC (not in-sample), PR-AUC, Precision@K, Lift@K - Multi-seed leakage trap robustness: mean delta >= 0.03, min >= 0.015 - Duplicate check, ACV range check, missingness bounds - Baseline eval script with LR + RF, value-aware scoring demo Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
7f4051d to
9391c77
Compare
This comment has been minimized.
This comment has been minimized.
COPILOT-1/6: Refactor _fit_lr_holdout and _fit_lr_auc_only to split before preprocessing. LabelEncoder, median imputation, and StandardScaler are now fit on training fold only. COPILOT-2: check_acv_range now coerces to numeric and fails explicitly when expected_acv has no usable values. COPILOT-3/4: Missingness ratio checks now handle empty lead_source slices explicitly instead of silently skipping on NaN comparisons. COPILOT-5: quick_baseline_eval_v5.py refactored to use split_and_preprocess() — same train-only preprocessing approach. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
pr-agent-context report: No unresolved review comments, failing checks, or actionable patch coverage gaps were found on PR
#25. Treat this PR as all clear unless new signals appear.Run metadata: |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for col in cat_cols: | ||
| le = LabelEncoder() | ||
| le.fit(x_train_raw[col].astype(str).fillna("__MISSING__")) | ||
| encoders[col] = le | ||
| x_train_raw[col] = le.transform(x_train_raw[col].astype(str).fillna("__MISSING__")) | ||
| # Unseen test categories get mapped to "__MISSING__" | ||
| test_vals = x_test_raw[col].astype(str).fillna("__MISSING__") | ||
| test_vals = test_vals.where(test_vals.isin(le.classes_), "__MISSING__") | ||
| # Ensure __MISSING__ is in classes (it always is since we fillna above) |
There was a problem hiding this comment.
LabelEncoder is fit on train values after fillna("MISSING"), but if the training fold has no nulls and no literal "MISSING" values, "MISSING" will not be in le.classes_. The later mapping of unseen test categories/NaNs to "MISSING" will then cause le.transform(...) to raise. Ensure the sentinel is always included in the fitted classes (or switch to an encoder that supports unknown categories explicitly).
| for col in cat_cols: | |
| le = LabelEncoder() | |
| le.fit(x_train_raw[col].astype(str).fillna("__MISSING__")) | |
| encoders[col] = le | |
| x_train_raw[col] = le.transform(x_train_raw[col].astype(str).fillna("__MISSING__")) | |
| # Unseen test categories get mapped to "__MISSING__" | |
| test_vals = x_test_raw[col].astype(str).fillna("__MISSING__") | |
| test_vals = test_vals.where(test_vals.isin(le.classes_), "__MISSING__") | |
| # Ensure __MISSING__ is in classes (it always is since we fillna above) | |
| missing_sentinel = "__MISSING__" | |
| for col in cat_cols: | |
| le = LabelEncoder() | |
| train_vals = x_train_raw[col].fillna(missing_sentinel).astype(str) | |
| test_vals = x_test_raw[col].fillna(missing_sentinel).astype(str) | |
| # Always include the sentinel in fitted classes so unseen/null test | |
| # values mapped to it can be transformed safely. | |
| le.fit(pd.concat([train_vals, pd.Series([missing_sentinel])], ignore_index=True)) | |
| encoders[col] = le | |
| x_train_raw[col] = le.transform(train_vals) | |
| # Unseen test categories get mapped to "__MISSING__" | |
| test_vals = test_vals.where(test_vals.isin(le.classes_), missing_sentinel) |
| encoders: dict[str, LabelEncoder] = {} | ||
| for col in cat_cols: | ||
| le = LabelEncoder() | ||
| le.fit(x_train_raw[col].astype(str).fillna("__MISSING__")) | ||
| encoders[col] = le |
There was a problem hiding this comment.
The local encoders dict is populated but never used. This will trigger ruff F841 (assigned but unused) and fail CI since ruff check . runs over scripts/. Either remove it or use it (e.g., for debugging output).
| encoders: dict[str, LabelEncoder] = {} | |
| for col in cat_cols: | |
| le = LabelEncoder() | |
| le.fit(x_train_raw[col].astype(str).fillna("__MISSING__")) | |
| encoders[col] = le | |
| for col in cat_cols: | |
| le = LabelEncoder() | |
| le.fit(x_train_raw[col].astype(str).fillna("__MISSING__")) |
| acv = df["expected_acv"].dropna() | ||
| print(f"{'FAIL' if errs else 'PASS'} (range: {acv.min():.0f}–{acv.max():.0f})") |
There was a problem hiding this comment.
The ACV range print path uses df["expected_acv"].dropna() without numeric coercion. If the CSV has expected_acv as strings (or mixed types), min()/max() can be wrong or raise. Use the same pd.to_numeric(..., errors="coerce") approach here as in check_acv_range() to keep the validator robust.
| acv = df["expected_acv"].dropna() | |
| print(f"{'FAIL' if errs else 'PASS'} (range: {acv.min():.0f}–{acv.max():.0f})") | |
| acv = pd.to_numeric(df["expected_acv"], errors="coerce").dropna() | |
| if acv.empty: | |
| print(f"{'FAIL' if errs else 'PASS'} (range: unavailable)") | |
| else: | |
| print(f"{'FAIL' if errs else 'PASS'} (range: {acv.min():.0f}–{acv.max():.0f})") |
| le.fit(x_train_raw[col].astype(str).fillna("__MISSING__")) | ||
| x_train_raw[col] = le.transform(x_train_raw[col].astype(str).fillna("__MISSING__")) | ||
| test_vals = x_test_raw[col].astype(str).fillna("__MISSING__") |
There was a problem hiding this comment.
Same LabelEncoder issue as in the validator: if the training fold has no nulls and no literal "MISSING" values, "MISSING" will not be in le.classes_, but unseen test categories/NaNs are mapped to "MISSING" and then transformed, which can raise. Make sure the sentinel is always part of the fitted classes (or use an encoder with handle_unknown).
| le.fit(x_train_raw[col].astype(str).fillna("__MISSING__")) | |
| x_train_raw[col] = le.transform(x_train_raw[col].astype(str).fillna("__MISSING__")) | |
| test_vals = x_test_raw[col].astype(str).fillna("__MISSING__") | |
| train_vals = x_train_raw[col].where(x_train_raw[col].notna(), "__MISSING__").astype(str) | |
| fit_vals = pd.concat([train_vals, pd.Series(["__MISSING__"])], ignore_index=True) | |
| le.fit(fit_vals) | |
| x_train_raw[col] = le.transform(train_vals) | |
| test_vals = x_test_raw[col].where(x_test_raw[col].notna(), "__MISSING__").astype(str) |
Summary
v5 lead scoring intro dataset with improved validation, value-aware scoring support, and narrative consistency.
No engine changes — this PR adds build/validate/eval scripts only.
Key improvements over v4
__leakage__total_touches_90d(fromtotal_touches_all) — explicit naming convention for pedagogyexpected_acvclipped to [$18k, $120k] pernarrative.yamldays_since_first_touchadded: second momentum feature (19 cols, up from 18)Validation results (all 10 checks pass)
New files
scripts/build_v5_snapshot.pyscripts/validate_v5_dataset.pyscripts/quick_baseline_eval_v5.pyDataset artifacts (in leadforge-datasets-private)
lead_scoring_intro_v5.csv— 1000 × 19, 30% conversionRELEASE_v5.md— full release notesBACKGROUND.md— updated with value-aware scoring sectionTest plan
python scripts/build_v5_snapshot.py /tmp/v5.csvsucceedspython scripts/validate_v5_dataset.py /tmp/v5.csvexits 0 (all 10 checks pass)python scripts/quick_baseline_eval_v5.py /tmp/v5.csvprints baseline metrics🤖 Generated with Claude Code