feat: v5 lead scoring dataset — robust validation, value-aware scoring by shaypal5 · Pull Request #25 · leadforge-dev/leadforge

shaypal5 · 2026-04-29T20:22:23Z

Summary

v5 lead scoring intro dataset with improved validation, value-aware scoring support, and narrative consistency.

No engine changes — this PR adds build/validate/eval scripts only.

Key improvements over v4

Snapshot day 10 → 14: longer observation window, more realistic for early-stage scoring
Leakage trap renamed: __leakage__total_touches_90d (from total_touches_all) — explicit naming convention for pedagogy
ACV capped to narrative range: expected_acv clipped to [$18k, $120k] per narrative.yaml
days_since_first_touch added: second momentum feature (19 cols, up from 18)
Hold-out validation: AUC/PR-AUC computed on 30% test set, not in-sample
Multi-seed leakage robustness: validates mean AUC delta >= 0.03 and min >= 0.015 across 10 seeds
New checks: duplicates, ACV range, leakage naming, Precision@K, Lift@K
Baseline eval script: LR + RF comparison, feature importance, value-aware scoring demo

Validation results (all 10 checks pass)

Check	Result
Baseline hold-out AUC	0.632
PR-AUC	0.396
Precision@50	0.400 (Lift: 1.33x)
Leakage trap mean delta (10 seeds)	0.033
Leakage trap min delta	0.015
ACV range	[18,000 – 120,000]

New files

scripts/build_v5_snapshot.py
scripts/validate_v5_dataset.py
scripts/quick_baseline_eval_v5.py

Dataset artifacts (in leadforge-datasets-private)

lead_scoring_intro_v5.csv — 1000 × 19, 30% conversion
RELEASE_v5.md — full release notes
BACKGROUND.md — updated with value-aware scoring section

Test plan

python scripts/build_v5_snapshot.py /tmp/v5.csv succeeds
python scripts/validate_v5_dataset.py /tmp/v5.csv exits 0 (all 10 checks pass)
python scripts/quick_baseline_eval_v5.py /tmp/v5.csv prints baseline metrics
CI passes (existing 609 tests — no engine changes)

🤖 Generated with Claude Code

Copilot

Pull request overview

This PR adds a v5 lead scoring “intro” dataset pipeline (build, validate, baseline eval) with expanded validation checks and a value-aware scoring demonstration, without changing the simulation engine.

Changes:

Add a v5 dataset builder script that generates a day-14 snapshot, adds the new momentum feature, caps ACV to the narrative range, and outputs a 1000×19 CSV.
Add a v5 validator script with 10 checks including hold-out AUC/PR-AUC, multi-seed leakage-trap robustness, missingness/duplicates/shape, leakage naming, and ACV range.
Add a quick baseline evaluation script comparing LR vs RF, showing leakage-trap delta, feature importance, and a value-aware ranking demo.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File	Description
scripts/build_v5_snapshot.py	Generates the v5 CSV from an internal bundle + day-14 snapshot with ACV capping, missingness injection, and v5 column contract.
scripts/validate_v5_dataset.py	Implements the v5 validation spec (10 checks) including hold-out metrics and multi-seed leakage robustness.
scripts/quick_baseline_eval_v5.py	Provides a runnable baseline evaluation + leakage comparison + feature importance + value-aware scoring example.
.agent-plan.md	Updates internal project tracking notes to reflect v5 dataset work completion.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…ripts v5 improvements over v4: - Snapshot day 10 → 14 (longer observation window, more realistic) - Leakage trap renamed to __leakage__total_touches_90d (explicit naming) - expected_acv clipped to narrative range [18k, 120k] - Added days_since_first_touch momentum feature (19 cols, up from 18) - Validator uses hold-out AUC (not in-sample), PR-AUC, Precision@K, Lift@K - Multi-seed leakage trap robustness: mean delta >= 0.03, min >= 0.015 - Duplicate check, ACV range check, missingness bounds - Baseline eval script with LR + RF, value-aware scoring demo Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

COPILOT-1/6: Refactor _fit_lr_holdout and _fit_lr_auc_only to split before preprocessing. LabelEncoder, median imputation, and StandardScaler are now fit on training fold only. COPILOT-2: check_acv_range now coerces to numeric and fails explicitly when expected_acv has no usable values. COPILOT-3/4: Missingness ratio checks now handle empty lead_source slices explicitly instead of silently skipping on NaN comparisons. COPILOT-5: quick_baseline_eval_v5.py refactored to use split_and_preprocess() — same train-only preprocessing approach. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-04-30T04:19:39Z

pr-agent-context report:

No unresolved review comments, failing checks, or actionable patch coverage gaps were found on PR
#25. Treat this PR as all clear unless new signals appear.

Run metadata:

Tool ref: v4
Tool version: 4.0.20
Trigger: commit pushed
Workflow run: 25147115861 attempt 1
Comment timestamp: 2026-04-30T04:18:50.284386+00:00
PR head commit: 87fab6002adf7ef8973e8a14601b9a62cfd0e0af

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-30T04:22:22Z

+    for col in cat_cols:
+        le = LabelEncoder()
+        le.fit(x_train_raw[col].astype(str).fillna("__MISSING__"))
+        encoders[col] = le
+        x_train_raw[col] = le.transform(x_train_raw[col].astype(str).fillna("__MISSING__"))
+        # Unseen test categories get mapped to "__MISSING__"
+        test_vals = x_test_raw[col].astype(str).fillna("__MISSING__")
+        test_vals = test_vals.where(test_vals.isin(le.classes_), "__MISSING__")
+        # Ensure __MISSING__ is in classes (it always is since we fillna above)


LabelEncoder is fit on train values after fillna("MISSING"), but if the training fold has no nulls and no literal "MISSING" values, "MISSING" will not be in le.classes_. The later mapping of unseen test categories/NaNs to "MISSING" will then cause le.transform(...) to raise. Ensure the sentinel is always included in the fitted classes (or switch to an encoder that supports unknown categories explicitly).

Suggested change

for col in cat_cols:

le = LabelEncoder()

le.fit(x_train_raw[col].astype(str).fillna("__MISSING__"))

encoders[col] = le

x_train_raw[col] = le.transform(x_train_raw[col].astype(str).fillna("__MISSING__"))

# Unseen test categories get mapped to "__MISSING__"

test_vals = x_test_raw[col].astype(str).fillna("__MISSING__")

test_vals = test_vals.where(test_vals.isin(le.classes_), "__MISSING__")

# Ensure __MISSING__ is in classes (it always is since we fillna above)

missing_sentinel = "__MISSING__"

for col in cat_cols:

le = LabelEncoder()

train_vals = x_train_raw[col].fillna(missing_sentinel).astype(str)

test_vals = x_test_raw[col].fillna(missing_sentinel).astype(str)

# Always include the sentinel in fitted classes so unseen/null test

# values mapped to it can be transformed safely.

le.fit(pd.concat([train_vals, pd.Series([missing_sentinel])], ignore_index=True))

encoders[col] = le

x_train_raw[col] = le.transform(train_vals)

# Unseen test categories get mapped to "__MISSING__"

test_vals = test_vals.where(test_vals.isin(le.classes_), missing_sentinel)

Copilot · 2026-04-30T04:22:22Z

+    encoders: dict[str, LabelEncoder] = {}
+    for col in cat_cols:
+        le = LabelEncoder()
+        le.fit(x_train_raw[col].astype(str).fillna("__MISSING__"))
+        encoders[col] = le


The local encoders dict is populated but never used. This will trigger ruff F841 (assigned but unused) and fail CI since ruff check . runs over scripts/. Either remove it or use it (e.g., for debugging output).

Suggested change

encoders: dict[str, LabelEncoder] = {}

for col in cat_cols:

le = LabelEncoder()

le.fit(x_train_raw[col].astype(str).fillna("__MISSING__"))

encoders[col] = le

for col in cat_cols:

le = LabelEncoder()

le.fit(x_train_raw[col].astype(str).fillna("__MISSING__"))

Copilot · 2026-04-30T04:22:22Z

+        acv = df["expected_acv"].dropna()
+        print(f"{'FAIL' if errs else 'PASS'} (range: {acv.min():.0f}–{acv.max():.0f})")


The ACV range print path uses df["expected_acv"].dropna() without numeric coercion. If the CSV has expected_acv as strings (or mixed types), min()/max() can be wrong or raise. Use the same pd.to_numeric(..., errors="coerce") approach here as in check_acv_range() to keep the validator robust.

Suggested change

acv = df["expected_acv"].dropna()

print(f"{'FAIL' if errs else 'PASS'} (range: {acv.min():.0f}–{acv.max():.0f})")

acv = pd.to_numeric(df["expected_acv"], errors="coerce").dropna()

if acv.empty:

print(f"{'FAIL' if errs else 'PASS'} (range: unavailable)")

else:

print(f"{'FAIL' if errs else 'PASS'} (range: {acv.min():.0f}–{acv.max():.0f})")

Copilot · 2026-04-30T04:22:23Z

+        le.fit(x_train_raw[col].astype(str).fillna("__MISSING__"))
+        x_train_raw[col] = le.transform(x_train_raw[col].astype(str).fillna("__MISSING__"))
+        test_vals = x_test_raw[col].astype(str).fillna("__MISSING__")


Same LabelEncoder issue as in the validator: if the training fold has no nulls and no literal "MISSING" values, "MISSING" will not be in le.classes_, but unseen test categories/NaNs are mapped to "MISSING" and then transformed, which can raise. Make sure the sentinel is always part of the fitted classes (or use an encoder with handle_unknown).

Suggested change

le.fit(x_train_raw[col].astype(str).fillna("__MISSING__"))

x_train_raw[col] = le.transform(x_train_raw[col].astype(str).fillna("__MISSING__"))

test_vals = x_test_raw[col].astype(str).fillna("__MISSING__")

train_vals = x_train_raw[col].where(x_train_raw[col].notna(), "__MISSING__").astype(str)

fit_vals = pd.concat([train_vals, pd.Series(["__MISSING__"])], ignore_index=True)

le.fit(fit_vals)

x_train_raw[col] = le.transform(train_vals)

test_vals = x_test_raw[col].where(x_test_raw[col].notna(), "__MISSING__").astype(str)

Copilot AI review requested due to automatic review settings April 29, 2026 20:22

shaypal5 added this to the v0.4.0 — Polished relational output + task export milestone Apr 29, 2026

shaypal5 added type: feature New capability type: docs Documentation or narrative changes labels Apr 29, 2026

Copilot started reviewing on behalf of shaypal5 April 29, 2026 20:22 View session

This comment has been minimized.

Sign in to view

Copilot AI reviewed Apr 29, 2026

View reviewed changes

This comment has been minimized.

Sign in to view

shaypal5 mentioned this pull request Apr 30, 2026

Fix approved pr-agent-context refresh runs #26

Merged

shaypal5 force-pushed the feat/v5-lead-scoring-dataset branch from 7f4051d to 9391c77 Compare April 30, 2026 04:12

This comment has been minimized.

Sign in to view

Copilot AI review requested due to automatic review settings April 30, 2026 04:18

Copilot started reviewing on behalf of shaypal5 April 30, 2026 04:19 View session

shaypal5 merged commit d44c90b into main Apr 30, 2026
7 checks passed

Copilot AI reviewed Apr 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: v5 lead scoring dataset — robust validation, value-aware scoring#25

feat: v5 lead scoring dataset — robust validation, value-aware scoring#25
shaypal5 merged 2 commits into
mainfrom
feat/v5-lead-scoring-dataset

shaypal5 commented Apr 29, 2026

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

This comment has been minimized.

github-actions Bot commented Apr 30, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 30, 2026

Uh oh!

Copilot AI Apr 30, 2026

Uh oh!

Copilot AI Apr 30, 2026

Uh oh!

Copilot AI Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		acv = df["expected_acv"].dropna()
		print(f"{'FAIL' if errs else 'PASS'} (range: {acv.min():.0f}–{acv.max():.0f})")

-        le.fit(x_train_raw[col].astype(str).fillna("__MISSING__"))
-        x_train_raw[col] = le.transform(x_train_raw[col].astype(str).fillna("__MISSING__"))
-        test_vals = x_test_raw[col].astype(str).fillna("__MISSING__")
+        train_vals = x_train_raw[col].where(x_train_raw[col].notna(), "__MISSING__").astype(str)
+        fit_vals = pd.concat([train_vals, pd.Series(["__MISSING__"])], ignore_index=True)
+        le.fit(fit_vals)
+        x_train_raw[col] = le.transform(train_vals)
+        test_vals = x_test_raw[col].where(x_test_raw[col].notna(), "__MISSING__").astype(str)

Conversation

shaypal5 commented Apr 29, 2026

Summary

Key improvements over v4

Validation results (all 10 checks pass)

New files

Dataset artifacts (in leadforge-datasets-private)

Test plan

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

This comment has been minimized.

github-actions Bot commented Apr 30, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants