diff --git a/.agent-plan.md b/.agent-plan.md index 89a3dfe..6ee4d2d 100644 --- a/.agent-plan.md +++ b/.agent-plan.md @@ -50,8 +50,12 @@ Goal: ship a best-in-class educational synthetic CRM lead-scoring dataset family - [x] PR 5.1: `scripts/generate_cover_image.py` (new) — deterministic Pillow + DejaVu Sans (bundled with matplotlib) renderer producing `release/dataset-cover-image.png` at 1280×640 (well above the 560×280 minimum, 2:1 aspect for Kaggle's header crop). Three-tier card design surfacing the cross-seed median conversion rate + LR AUC for each tier, pinned from `release/validation/validation_report.md`. Byte-identical re-runs guarded by `tests/scripts/test_generate_cover_image.py`. - [x] PR 5.1: Upload-dir assembly under `release/kaggle/` uses relative symlinks for the heavy bundle directories + cover image + LICENSE, plus a real file copy for `README.md` (rewritten on the way in so its `../` links and tree diagram render correctly on the Kaggle dataset page). `_validate_kaggle_dir_safe` refuses to assemble into `cwd` / `release_dir` / its parent / the filesystem anchor. `release/kaggle/*` is gitignored except for `dataset-metadata.json` itself — only the metadata is committed; the upload tree is regenerated on demand. - [x] PR 5.1: 19 new tests (`tests/scripts/test_package_kaggle_release.py` × 15, `tests/scripts/test_generate_cover_image.py` × 4): every Kaggle field constraint, schema field order parity for CSV + parquet, README rewriting (tree + `../` + validation report links), unsafe-kaggle-dir rejection, CLI rc=2 on missing release dir, byte-determinism (audit-artifact-sync), and committed-metadata-matches-fresh-regeneration sync check. 1194/1194 tests pass; ruff + mypy clean; `scripts/probe_relational_leakage.py release/{intro,intermediate,advanced} --max-accuracy 0.65` exits 0 on every public tier; `scripts/verify_hash_determinism.py` PASS 67/67; `scripts/validate_release_candidate.py --no-rebuild` exits 0; `BUNDLE_SCHEMA_VERSION` unchanged at 5 (this PR doesn't touch the bundle shape). -- [ ] PR 5.2: `scripts/package_hf_release.py` → `release/huggingface/README.md` with YAML configs/default/pretty_name/tags -- [ ] PR 5.2: Local `load_dataset()` smoke test; Kaggle dry-run package validation +- [x] PR 5.2: `scripts/package_hf_release.py` (new) — Hugging Face release packager. Reads each public tier's `manifest.json` under `release/`, emits `release/huggingface/README.md` with YAML frontmatter satisfying G12.1 (`pretty_name`, `license: mit`, `language: [en]`, `task_categories: [tabular-classification]`, `size_categories: [1K