The build pipeline functions in scripts/build_v5_snapshot.py (e.g., subsample(), inject_missingness(), derive_binary_features(), cap_expected_acv(), rename_and_select(), boost_leakage_trap()) contain testable logic that should live in a proper module under leadforge/ (e.g., leadforge/pipelines/build_v5.py).
The script itself should become a thin CLI wrapper that imports from the package, similar to how scripts/validate_lead_scoring_dataset.py already delegates to leadforge.validation.lead_scoring.
This would:
- Eliminate the
importlib hack needed to test these functions (see tests/scripts/test_build_v5_snapshot.py)
- Make the functions importable by other code
- Keep
scripts/ as thin CLI entry points
Identified during review of PR #28.
The build pipeline functions in
scripts/build_v5_snapshot.py(e.g.,subsample(),inject_missingness(),derive_binary_features(),cap_expected_acv(),rename_and_select(),boost_leakage_trap()) contain testable logic that should live in a proper module underleadforge/(e.g.,leadforge/pipelines/build_v5.py).The script itself should become a thin CLI wrapper that imports from the package, similar to how
scripts/validate_lead_scoring_dataset.pyalready delegates toleadforge.validation.lead_scoring.This would:
importlibhack needed to test these functions (seetests/scripts/test_build_v5_snapshot.py)scripts/as thin CLI entry pointsIdentified during review of PR #28.