Validation script uses train AUC — bounds are not meaningful for generalization

## Problem

`scripts/validate_v4_dataset.py` fits a logistic regression on the full 1000 rows and evaluates AUC on the *same* 1000 rows (train AUC). The 0.65–0.90 bounds and leakage trap boost threshold (0.03) are calibrated against this inflated metric.

Train AUC is a function of feature count and sample size, not just signal strength. With 17 features (several categorical, expanding the effective dimensionality) and only 1000 rows, train AUC > 0.65 is almost guaranteed even with weak signal.

## Suggested fix

Replace train AUC with k-fold cross-validated AUC (e.g. 5-fold stratified). This requires:
1. Switching `_fit_lr` to use `cross_val_score` or manual fold logic
2. Recalibrating `AUC_LOWER`, `AUC_UPPER`, and `AUC_TRAP_BOOST` against the new metric
3. Updating `docs/v4/validation_spec.md` with the new bounds

## Context

Identified in self-review of PR #21. The current check still catches gross failures, so this is not blocking v4 release.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validation script uses train AUC — bounds are not meaningful for generalization #22

Problem

Suggested fix

Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Validation script uses train AUC — bounds are not meaningful for generalization #22

Description

Problem

Suggested fix

Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions