Skip to content

feat: v5 lead scoring dataset — robust validation, value-aware scoring#25

Merged
shaypal5 merged 2 commits into
mainfrom
feat/v5-lead-scoring-dataset
Apr 30, 2026
Merged

feat: v5 lead scoring dataset — robust validation, value-aware scoring#25
shaypal5 merged 2 commits into
mainfrom
feat/v5-lead-scoring-dataset

Conversation

@shaypal5

Copy link
Copy Markdown
Contributor

Summary

v5 lead scoring intro dataset with improved validation, value-aware scoring support, and narrative consistency.

No engine changes — this PR adds build/validate/eval scripts only.

Key improvements over v4

  • Snapshot day 10 → 14: longer observation window, more realistic for early-stage scoring
  • Leakage trap renamed: __leakage__total_touches_90d (from total_touches_all) — explicit naming convention for pedagogy
  • ACV capped to narrative range: expected_acv clipped to [$18k, $120k] per narrative.yaml
  • days_since_first_touch added: second momentum feature (19 cols, up from 18)
  • Hold-out validation: AUC/PR-AUC computed on 30% test set, not in-sample
  • Multi-seed leakage robustness: validates mean AUC delta >= 0.03 and min >= 0.015 across 10 seeds
  • New checks: duplicates, ACV range, leakage naming, Precision@K, Lift@K
  • Baseline eval script: LR + RF comparison, feature importance, value-aware scoring demo

Validation results (all 10 checks pass)

Check Result
Baseline hold-out AUC 0.632
PR-AUC 0.396
Precision@50 0.400 (Lift: 1.33x)
Leakage trap mean delta (10 seeds) 0.033
Leakage trap min delta 0.015
ACV range [18,000 – 120,000]

New files

  • scripts/build_v5_snapshot.py
  • scripts/validate_v5_dataset.py
  • scripts/quick_baseline_eval_v5.py

Dataset artifacts (in leadforge-datasets-private)

  • lead_scoring_intro_v5.csv — 1000 × 19, 30% conversion
  • RELEASE_v5.md — full release notes
  • BACKGROUND.md — updated with value-aware scoring section

Test plan

  • python scripts/build_v5_snapshot.py /tmp/v5.csv succeeds
  • python scripts/validate_v5_dataset.py /tmp/v5.csv exits 0 (all 10 checks pass)
  • python scripts/quick_baseline_eval_v5.py /tmp/v5.csv prints baseline metrics
  • CI passes (existing 609 tests — no engine changes)

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings April 29, 2026 20:22
@shaypal5 shaypal5 added type: feature New capability type: docs Documentation or narrative changes labels Apr 29, 2026
@github-actions

This comment has been minimized.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a v5 lead scoring “intro” dataset pipeline (build, validate, baseline eval) with expanded validation checks and a value-aware scoring demonstration, without changing the simulation engine.

Changes:

  • Add a v5 dataset builder script that generates a day-14 snapshot, adds the new momentum feature, caps ACV to the narrative range, and outputs a 1000×19 CSV.
  • Add a v5 validator script with 10 checks including hold-out AUC/PR-AUC, multi-seed leakage-trap robustness, missingness/duplicates/shape, leakage naming, and ACV range.
  • Add a quick baseline evaluation script comparing LR vs RF, showing leakage-trap delta, feature importance, and a value-aware ranking demo.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File Description
scripts/build_v5_snapshot.py Generates the v5 CSV from an internal bundle + day-14 snapshot with ACV capping, missingness injection, and v5 column contract.
scripts/validate_v5_dataset.py Implements the v5 validation spec (10 checks) including hold-out metrics and multi-seed leakage robustness.
scripts/quick_baseline_eval_v5.py Provides a runnable baseline evaluation + leakage comparison + feature importance + value-aware scoring example.
.agent-plan.md Updates internal project tracking notes to reflect v5 dataset work completion.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/validate_v5_dataset.py Outdated
Comment thread scripts/validate_v5_dataset.py Outdated
Comment thread scripts/validate_v5_dataset.py Outdated
Comment thread scripts/validate_v5_dataset.py Outdated
Comment thread scripts/quick_baseline_eval_v5.py Outdated
Comment thread scripts/validate_v5_dataset.py Outdated
@github-actions

This comment has been minimized.

…ripts

v5 improvements over v4:
- Snapshot day 10 → 14 (longer observation window, more realistic)
- Leakage trap renamed to __leakage__total_touches_90d (explicit naming)
- expected_acv clipped to narrative range [18k, 120k]
- Added days_since_first_touch momentum feature (19 cols, up from 18)
- Validator uses hold-out AUC (not in-sample), PR-AUC, Precision@K, Lift@K
- Multi-seed leakage trap robustness: mean delta >= 0.03, min >= 0.015
- Duplicate check, ACV range check, missingness bounds
- Baseline eval script with LR + RF, value-aware scoring demo

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@shaypal5 shaypal5 force-pushed the feat/v5-lead-scoring-dataset branch from 7f4051d to 9391c77 Compare April 30, 2026 04:12
@github-actions

This comment has been minimized.

COPILOT-1/6: Refactor _fit_lr_holdout and _fit_lr_auc_only to split
before preprocessing. LabelEncoder, median imputation, and
StandardScaler are now fit on training fold only.

COPILOT-2: check_acv_range now coerces to numeric and fails explicitly
when expected_acv has no usable values.

COPILOT-3/4: Missingness ratio checks now handle empty lead_source
slices explicitly instead of silently skipping on NaN comparisons.

COPILOT-5: quick_baseline_eval_v5.py refactored to use
split_and_preprocess() — same train-only preprocessing approach.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 30, 2026 04:18
@github-actions

Copy link
Copy Markdown

pr-agent-context report:

No unresolved review comments, failing checks, or actionable patch coverage gaps were found on PR
#25. Treat this PR as all clear unless new signals appear.

Run metadata:

Tool ref: v4
Tool version: 4.0.20
Trigger: commit pushed
Workflow run: 25147115861 attempt 1
Comment timestamp: 2026-04-30T04:18:50.284386+00:00
PR head commit: 87fab6002adf7ef8973e8a14601b9a62cfd0e0af

@shaypal5 shaypal5 merged commit d44c90b into main Apr 30, 2026
7 checks passed

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +107 to +115
for col in cat_cols:
le = LabelEncoder()
le.fit(x_train_raw[col].astype(str).fillna("__MISSING__"))
encoders[col] = le
x_train_raw[col] = le.transform(x_train_raw[col].astype(str).fillna("__MISSING__"))
# Unseen test categories get mapped to "__MISSING__"
test_vals = x_test_raw[col].astype(str).fillna("__MISSING__")
test_vals = test_vals.where(test_vals.isin(le.classes_), "__MISSING__")
# Ensure __MISSING__ is in classes (it always is since we fillna above)

Copilot AI Apr 30, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LabelEncoder is fit on train values after fillna("MISSING"), but if the training fold has no nulls and no literal "MISSING" values, "MISSING" will not be in le.classes_. The later mapping of unseen test categories/NaNs to "MISSING" will then cause le.transform(...) to raise. Ensure the sentinel is always included in the fitted classes (or switch to an encoder that supports unknown categories explicitly).

Suggested change
for col in cat_cols:
le = LabelEncoder()
le.fit(x_train_raw[col].astype(str).fillna("__MISSING__"))
encoders[col] = le
x_train_raw[col] = le.transform(x_train_raw[col].astype(str).fillna("__MISSING__"))
# Unseen test categories get mapped to "__MISSING__"
test_vals = x_test_raw[col].astype(str).fillna("__MISSING__")
test_vals = test_vals.where(test_vals.isin(le.classes_), "__MISSING__")
# Ensure __MISSING__ is in classes (it always is since we fillna above)
missing_sentinel = "__MISSING__"
for col in cat_cols:
le = LabelEncoder()
train_vals = x_train_raw[col].fillna(missing_sentinel).astype(str)
test_vals = x_test_raw[col].fillna(missing_sentinel).astype(str)
# Always include the sentinel in fitted classes so unseen/null test
# values mapped to it can be transformed safely.
le.fit(pd.concat([train_vals, pd.Series([missing_sentinel])], ignore_index=True))
encoders[col] = le
x_train_raw[col] = le.transform(train_vals)
# Unseen test categories get mapped to "__MISSING__"
test_vals = test_vals.where(test_vals.isin(le.classes_), missing_sentinel)

Copilot uses AI. Check for mistakes.
Comment on lines +106 to +110
encoders: dict[str, LabelEncoder] = {}
for col in cat_cols:
le = LabelEncoder()
le.fit(x_train_raw[col].astype(str).fillna("__MISSING__"))
encoders[col] = le

Copilot AI Apr 30, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The local encoders dict is populated but never used. This will trigger ruff F841 (assigned but unused) and fail CI since ruff check . runs over scripts/. Either remove it or use it (e.g., for debugging output).

Suggested change
encoders: dict[str, LabelEncoder] = {}
for col in cat_cols:
le = LabelEncoder()
le.fit(x_train_raw[col].astype(str).fillna("__MISSING__"))
encoders[col] = le
for col in cat_cols:
le = LabelEncoder()
le.fit(x_train_raw[col].astype(str).fillna("__MISSING__"))

Copilot uses AI. Check for mistakes.
Comment on lines +510 to +511
acv = df["expected_acv"].dropna()
print(f"{'FAIL' if errs else 'PASS'} (range: {acv.min():.0f}–{acv.max():.0f})")

Copilot AI Apr 30, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ACV range print path uses df["expected_acv"].dropna() without numeric coercion. If the CSV has expected_acv as strings (or mixed types), min()/max() can be wrong or raise. Use the same pd.to_numeric(..., errors="coerce") approach here as in check_acv_range() to keep the validator robust.

Suggested change
acv = df["expected_acv"].dropna()
print(f"{'FAIL' if errs else 'PASS'} (range: {acv.min():.0f}{acv.max():.0f})")
acv = pd.to_numeric(df["expected_acv"], errors="coerce").dropna()
if acv.empty:
print(f"{'FAIL' if errs else 'PASS'} (range: unavailable)")
else:
print(f"{'FAIL' if errs else 'PASS'} (range: {acv.min():.0f}{acv.max():.0f})")

Copilot uses AI. Check for mistakes.
Comment on lines +52 to +54
le.fit(x_train_raw[col].astype(str).fillna("__MISSING__"))
x_train_raw[col] = le.transform(x_train_raw[col].astype(str).fillna("__MISSING__"))
test_vals = x_test_raw[col].astype(str).fillna("__MISSING__")

Copilot AI Apr 30, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same LabelEncoder issue as in the validator: if the training fold has no nulls and no literal "MISSING" values, "MISSING" will not be in le.classes_, but unseen test categories/NaNs are mapped to "MISSING" and then transformed, which can raise. Make sure the sentinel is always part of the fitted classes (or use an encoder with handle_unknown).

Suggested change
le.fit(x_train_raw[col].astype(str).fillna("__MISSING__"))
x_train_raw[col] = le.transform(x_train_raw[col].astype(str).fillna("__MISSING__"))
test_vals = x_test_raw[col].astype(str).fillna("__MISSING__")
train_vals = x_train_raw[col].where(x_train_raw[col].notna(), "__MISSING__").astype(str)
fit_vals = pd.concat([train_vals, pd.Series(["__MISSING__"])], ignore_index=True)
le.fit(fit_vals)
x_train_raw[col] = le.transform(train_vals)
test_vals = x_test_raw[col].where(x_test_raw[col].notna(), "__MISSING__").astype(str)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type: docs Documentation or narrative changes type: feature New capability

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants