feat: Milestone 10 — CLI generate/inspect/validate commands#16
Conversation
- generate: parses all flags, calls Generator.from_recipe().generate(), writes bundle via .save(); supports --override, --difficulty, population size overrides - inspect: reads manifest.json, prints summary (recipe, seed, mode, tables with row counts, task splits, metadata presence) - validate: checks manifest presence, required files, table row counts, SHA-256 hashes, task split integrity, FK constraints, leakage detection (unexpected columns in task splits) - 22 CLI tests (smoke, generate integration for both modes, inspect output assertions, validate pass/fail/corrupt/missing-file scenarios) - 562 total tests passing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
Pass n_accounts/n_contacts/n_leads as explicit keyword args to gen.generate() instead of **dict unpacking, which mypy could not reconcile with the typed signature. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Implements the Milestone 10 CLI surface area by wiring generate, inspect, and validate to real bundle I/O, plus integration tests covering the new commands.
Changes:
- Implement
leadforge generateend-to-end (parse options → generate bundle → save to disk). - Add
inspectsummary output based onmanifest.json. - Add
validatechecks for required artifacts, row counts, hashes, FK integrity, and leakage; expand CLI test suite accordingly.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
leadforge/cli/commands/generate.py |
Wires CLI flags to Generator.from_recipe(...).generate() and bundle.save(out). |
leadforge/cli/commands/inspect.py |
Reads manifest.json and prints a human-readable bundle summary. |
leadforge/cli/commands/validate.py |
Adds on-disk bundle validation: files present, hashes/row counts, FK checks, leakage checks. |
tests/test_cli.py |
Replaces stubs with integration tests for generate/inspect/validate behavior. |
.agent-plan.md |
Updates milestone status/plan notes to reflect M10 completion and M11 next steps. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This comment has been minimized.
This comment has been minimized.
…sha256 - COPILOT-1/4: inspect.py and validate.py now use load_json() from core.serialization with LeadforgeError → clean typer.Exit(1) - COPILOT-2/3: confirmed Typer CliRunner always mixes stderr into result.output; assertions are already correct (resolved as-is) - COPILOT-5: opened #17 for Parquet metadata optimization (out-of-scope for v1 bundle sizes) - COPILOT-6: moved _sha256 to core.hashing.file_sha256(); both render/manifests.py and cli/commands/validate.py now import from there Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…pand tests
Addresses all 13 items from self-review:
generate.py:
1. Remove double population-count pass (was in both from_recipe and generate)
2. Wrap from_recipe/generate/load_yaml with try/except → clean CLI errors
3. Validate --override path exists before loading
inspect.py:
4. Defensive manifest parsing — handle non-dict tables/tasks entries
5. Check bundle_path is a directory, not a file
validate.py → validation/bundle_checks.py:
6. FK checks now report "FK check skipped" when a table is missing
instead of silently passing
7. Leakage check no longer re-reads train.parquet (reads schema only)
8. Leakage check now covers all splits (train, valid, test)
13. Extract all validation logic to validation/bundle_checks.py;
CLI command is now a thin shell
tests:
9. Add warning comment to module-scoped bundle_dir fixture
10. Add tests for --override (valid file + missing file)
11. Add test for --difficulty flag
12. Consolidate 5 inspect invocations into 1 test with multiple asserts
+ Add file-instead-of-dir tests for inspect and validate
+ Assert FK-skip message in missing-table validate test
+ New test_bundle_checks.py with 5 unit tests for validation module
568 tests passing; ruff + mypy clean.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
- generate: reject --override files that don't contain a YAML mapping - bundle_checks: guard tables/tasks manifest fields with isinstance checks; report "Malformed manifest" instead of crashing on non-dict Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
pr-agent-context report: No unresolved review comments, failing checks, or actionable patch coverage gaps were found on PR
#16. Treat this PR as all clear unless new signals appear.Run metadata: |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 10 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| child_vals = set(child_df[fk.child_column].dropna()) | ||
| parent_vals = set(parent_df[fk.parent_column].dropna()) | ||
| orphans = child_vals - parent_vals | ||
| if orphans: | ||
| sample = list(orphans)[:3] | ||
| errors.append( | ||
| f"FK violation: {fk.child_table}.{fk.child_column} → " | ||
| f"{fk.parent_table}.{fk.parent_column}: " | ||
| f"{len(orphans)} orphan(s), e.g. {sample}" |
There was a problem hiding this comment.
FK validation builds Python set(...)s from entire FK/PK columns (dropna()), which can be very memory-heavy for large bundles. Consider using vectorized membership (e.g., child_series[~child_series.isin(parent_series)] with parent_series.unique()) or chunking to keep peak memory bounded.
| child_vals = set(child_df[fk.child_column].dropna()) | |
| parent_vals = set(parent_df[fk.parent_column].dropna()) | |
| orphans = child_vals - parent_vals | |
| if orphans: | |
| sample = list(orphans)[:3] | |
| errors.append( | |
| f"FK violation: {fk.child_table}.{fk.child_column} → " | |
| f"{fk.parent_table}.{fk.parent_column}: " | |
| f"{len(orphans)} orphan(s), e.g. {sample}" | |
| child_series = child_df[fk.child_column].dropna() | |
| parent_values = parent_df[fk.parent_column].dropna().unique() | |
| orphan_rows = child_series[~child_series.isin(parent_values)] | |
| orphan_values = orphan_rows.unique() | |
| if len(orphan_values) > 0: | |
| sample = orphan_values[:3].tolist() | |
| errors.append( | |
| f"FK violation: {fk.child_table}.{fk.child_column} → " | |
| f"{fk.parent_table}.{fk.parent_column}: " | |
| f"{len(orphan_values)} orphan(s), e.g. {sample}" |
| if isinstance(tables, dict): | ||
| for name, info in tables.items(): | ||
| row_count = _safe_get(info, "row_count", "?") | ||
| typer.echo(f" {name:25s} {row_count:>8} rows") |
There was a problem hiding this comment.
row_count = _safe_get(info, 'row_count', '?') may return None (or another non-string) if the manifest is malformed, and f"{row_count:>8}" will raise TypeError for values like None. To keep inspect robust against partially-corrupt manifests, coerce to string before formatting (e.g., f"{str(row_count):>8}") or guard None explicitly.
| typer.echo(f" {name:25s} {row_count:>8} rows") | |
| typer.echo(f" {name:25s} {str(row_count):>8} rows") |
| split_path = root / f"tasks/{task_id}/{split}.parquet" | ||
| if split_path.exists(): | ||
| actual_columns = set(pd.read_parquet(split_path, columns=[]).columns) | ||
| extra = actual_columns - expected_columns | ||
| if extra: | ||
| errors.append( |
There was a problem hiding this comment.
Leakage detection behavior isn’t covered by tests yet (e.g., a task split Parquet containing an extra column should trigger a validation error). Adding a targeted test case would help prevent regressions once the column-detection logic is fixed.
| split_path = root / f"tasks/{task_id}/{split}.parquet" | ||
| if split_path.exists(): | ||
| actual_columns = set(pd.read_parquet(split_path, columns=[]).columns) | ||
| extra = actual_columns - expected_columns |
There was a problem hiding this comment.
Leakage detection is currently ineffective: pd.read_parquet(split_path, columns=[]) returns a DataFrame with zero columns, so actual_columns will always be empty and extra will never be detected. Read the Parquet schema/metadata to get column names (e.g., via pyarrow.parquet.ParquetFile(...).schema.names) or read the file without columns=[]/with nrows=0-style approach if supported.
| manifest = load_json(bundle_root / "manifest.json") | ||
| errors: list[str] = [] | ||
| errors.extend(_check_required_files(bundle_root)) | ||
| tables, table_errors = _check_tables(bundle_root, manifest) | ||
| errors.extend(table_errors) |
There was a problem hiding this comment.
validate_bundle assumes load_json() returns a dict; if manifest.json parses to a non-object (e.g., list/str), this will raise at manifest.get(...) and crash the CLI (it only catches LeadforgeError). Consider validating isinstance(manifest, dict) up front and returning a clear error like "Malformed manifest: root is not a JSON object".
| errors.append(f"Missing table file: {rel_path}") | ||
| continue | ||
|
|
||
| df = pd.read_parquet(abs_path) |
There was a problem hiding this comment.
pd.read_parquet(abs_path) can raise (corrupt/unsupported parquet, IO errors). Right now that would abort validation instead of returning an error list. Wrap the read in a try/except and append an error like "Failed to read table parquet: ..." so validate_bundle can report failures gracefully.
| df = pd.read_parquet(abs_path) | |
| try: | |
| df = pd.read_parquet(abs_path) | |
| except Exception as exc: | |
| errors.append(f"Failed to read table parquet {rel_path}: {exc}") | |
| continue |
| errors.append(f"Missing task file: {rel_path}") | ||
| continue | ||
|
|
||
| df = pd.read_parquet(abs_path) |
There was a problem hiding this comment.
Same issue for task splits: pd.read_parquet(abs_path) may raise and currently would crash validation. Consider catching exceptions per split and recording an error so a single bad split file doesn't prevent reporting other problems.
| df = pd.read_parquet(abs_path) | |
| try: | |
| df = pd.read_parquet(abs_path) | |
| except Exception as exc: | |
| errors.append(f"Task {task_id}/{split}: failed to read parquet file: {exc}") | |
| continue |
Fold the brutal self-review's findings back into the PR before review. Bugs: - (#1) run_packager validate→write order — both packagers wrote README/metadata on validation failure, leaving corrupt artifacts on disk that would silently get committed. Gated on `errors == ()`; added no-write tests for both packagers. - (#2) Instructor README inlined the public 3-tier README into a 1-tier dataset card. Replaced with a dedicated `INSTRUCTOR_BODY` constant that links to the public dataset and describes only the instructor-specific additions (full-horizon tables, hidden DAG, latent registry, mechanism summary). - (#3) validate_upload_dir_safe also blocks strict descendants of release_dir; `--huggingface-dir release/intro` would otherwise rmtree the intro bundle. Architecture: - (#5) Finished shared-primitives extraction: SOURCE_TREE_BLOCK, validate_readme_substitution, replace_file, replace_dir, load_manifest now live in scripts/_release_common.py. Both packagers reduced to imports. - (#6) Replaced 60-line hand-rolled YAML renderer with yaml.safe_dump + a 4-line _IndentedDumper subclass. - (#7) Removed dead --owner / --dataset-slug CLI flags. - (#8) assemble_upload_dir now takes rendered_readme and writes it. - (#9) build_config_for_tier made pure (no I/O); cheap manifest-stat preflight via _assert_tier_dir_exists. - (#10) --default-config with --variant=instructor errors loudly. CI: - (#4) Added [publish] extra (datasets>=2.14, kaggle>=1.6) so the gated G12.3 / G12.4 / G11.3 tests install in one line. Cleanups: visual cruft (#13–#16), test cruft (#17 — unused tmp_path, dead tag_lines), em-dash YAML round-trip parametrised for the instructor pretty_name. Verification: 1223 tests pass + 5 gated skips; ruff + mypy clean; hash determinism PASS 67/67; leakage probes 0/3 reconstruct on every tier; validate_release_candidate --no-rebuild exits 0. release/{kaggle,huggingface,huggingface-instructor}/dataset-metadata .json|README.md regenerated; audit-artifact-sync tests guard them. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* PR 5.2: HuggingFace release packager + load_dataset smoke test
Add `scripts/package_hf_release.py` to generate `release/huggingface/README.md`
with G12.1-compliant YAML frontmatter (pretty_name, license, language,
task_categories, size_categories, tags, three configs with `default: true`
on intermediate per G12.2), inlining the rewritten `release/README.md`
body with HF-specific link rewrites. `--variant=instructor` packages the
companion repo (G12.4) from `release/intermediate_instructor/` into a
separate `release/huggingface-instructor/` upload tree. G12.3 covered
by a parametrised `load_dataset()` smoke test gated on the optional
`datasets` SDK.
Extract shared release-packaging primitives (link rewriter, dir-safety
guard, cover-image validator) into `scripts/_release_common.py`; refactor
the Kaggle packager to import them. `release/kaggle/dataset-metadata.json`
is byte-stable across the refactor.
Delete the legacy `release/HF_DATASET_CARD.md` stub — superseded by the
generated card. Gitignore `release/huggingface{,-instructor}/*` except
the committed README.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* PR 5.2 self-review fixes (Kaggle + HF packagers)
Fold the brutal self-review's findings back into the PR before review.
Bugs:
- (#1) run_packager validate→write order — both packagers wrote
README/metadata on validation failure, leaving corrupt artifacts on
disk that would silently get committed. Gated on `errors == ()`;
added no-write tests for both packagers.
- (#2) Instructor README inlined the public 3-tier README into a
1-tier dataset card. Replaced with a dedicated `INSTRUCTOR_BODY`
constant that links to the public dataset and describes only the
instructor-specific additions (full-horizon tables, hidden DAG,
latent registry, mechanism summary).
- (#3) validate_upload_dir_safe also blocks strict descendants of
release_dir; `--huggingface-dir release/intro` would otherwise
rmtree the intro bundle.
Architecture:
- (#5) Finished shared-primitives extraction: SOURCE_TREE_BLOCK,
validate_readme_substitution, replace_file, replace_dir,
load_manifest now live in scripts/_release_common.py. Both
packagers reduced to imports.
- (#6) Replaced 60-line hand-rolled YAML renderer with yaml.safe_dump
+ a 4-line _IndentedDumper subclass.
- (#7) Removed dead --owner / --dataset-slug CLI flags.
- (#8) assemble_upload_dir now takes rendered_readme and writes it.
- (#9) build_config_for_tier made pure (no I/O); cheap manifest-stat
preflight via _assert_tier_dir_exists.
- (#10) --default-config with --variant=instructor errors loudly.
CI:
- (#4) Added [publish] extra (datasets>=2.14, kaggle>=1.6) so the
gated G12.3 / G12.4 / G11.3 tests install in one line.
Cleanups: visual cruft (#13–#16), test cruft (#17 — unused tmp_path,
dead tag_lines), em-dash YAML round-trip parametrised for the
instructor pretty_name.
Verification: 1223 tests pass + 5 gated skips; ruff + mypy clean;
hash determinism PASS 67/67; leakage probes 0/3 reconstruct on every
tier; validate_release_candidate --no-rebuild exits 0.
release/{kaggle,huggingface,huggingface-instructor}/dataset-metadata
.json|README.md regenerated; audit-artifact-sync tests guard them.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* PR 5.2 Copilot-review fixes (Kaggle + HF packagers)
Fold Copilot's two real findings on the self-review revision back in.
COPILOT-1 — validate_upload_dir_safe was only invoked inside
assemble_upload_dir, which --dry-run skips. A dry-run with
--huggingface-dir release (or .) would write the README into the
unsafe path BEFORE the safety net fired. Hoist the check into
run_packager (both packagers) so it runs before any mkdir or write;
the inner assemble_upload_dir call stays as defence-in-depth for
direct callers. New tests: dry-run with unsafe upload-dir raises
without writing; the same path through main() returns rc=2.
COPILOT-2 — Cover-image path resolution was inconsistent:
validate_cover_image used cover_image as passed, while
assemble_upload_dir did a separate ``release_dir / cover_image.name``
fallback. Diverged for bare-basename inputs (false validation
failures) and two-paths-sharing-a-basename (assembler shadowing the
explicit path). Added resolve_cover_image_path() to
_release_common.py (explicit-wins, release-dir fallback);
run_packager calls it once and threads the resolved path through
validation, the metadata's image field, and assembly. New
tests/scripts/test_release_common.py covers the four resolution
branches; new packager-side tests confirm bare-basename success +
metadata field plumbing.
COPILOT-3 — outdated; already addressed by self-review fix #8 in
commit f2fc4a2. Resolved as already treated; no code change.
Verification: 1232/1232 tests pass + 5 gated skips; ruff + mypy
clean; hash determinism PASS 67/67; leakage probes rc=0 on every
tier; validate_release_candidate --no-rebuild exits 0;
BUNDLE_SCHEMA_VERSION unchanged at 5.
release/{kaggle,huggingface,huggingface-instructor}/* artifacts
regenerated byte-identically.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Summary
generate: fully wired end-to-end — parses all CLI flags (--recipe,--seed,--mode,--out,--difficulty,--n-accounts/contacts/leads,--horizon-days,--override), callsGenerator.from_recipe().generate(), writes bundle via.save()inspect: readsmanifest.jsonand prints human-readable summary — recipe, seed, mode, difficulty, motif family, all tables with row counts, task splits, metadata presencevalidate: checks manifest presence, required files (dataset_card.md,feature_dictionary.csv), table row counts vs manifest, SHA-256 hash integrity, task split file presence/counts/hashes, FK constraint integrity across all 10 relationships, leakage detection (unexpected columns in task splits)Test plan
🤖 Generated with Claude Code