Problem
scripts/build_v4_snapshot.py (213 lines) and scripts/validate_v4_dataset.py (334 lines) have no automated tests. The validation script is itself a form of integration test for the generated CSV, but individual functions like subsample(), inject_missingness(), and the validation checks themselves are untested in isolation.
Suggested scope
- Unit tests for
subsample() edge cases (insufficient positives/negatives)
- Property tests for
inject_missingness() (rates within expected bounds)
- Unit tests for each validation check function
- Integration test:
build_v4_dataset() → validate() returns 0
These scripts live in scripts/ (not leadforge/), so they need explicit test discovery or a test helper that imports them.
Context
Identified in self-review of PR #21. The scripts will stabilize further in v4-M2 (release), making that a natural point to add tests.
Problem
scripts/build_v4_snapshot.py(213 lines) andscripts/validate_v4_dataset.py(334 lines) have no automated tests. The validation script is itself a form of integration test for the generated CSV, but individual functions likesubsample(),inject_missingness(), and the validation checks themselves are untested in isolation.Suggested scope
subsample()edge cases (insufficient positives/negatives)inject_missingness()(rates within expected bounds)build_v4_dataset()→validate()returns 0These scripts live in
scripts/(notleadforge/), so they need explicit test discovery or a test helper that imports them.Context
Identified in self-review of PR #21. The scripts will stabilize further in v4-M2 (release), making that a natural point to add tests.