feat: Milestone 10 — CLI generate/inspect/validate commands by shaypal5 · Pull Request #16 · leadforge-dev/leadforge

shaypal5 · 2026-04-29T05:58:20Z

Summary

generate: fully wired end-to-end — parses all CLI flags (--recipe, --seed, --mode, --out, --difficulty, --n-accounts/contacts/leads, --horizon-days, --override), calls Generator.from_recipe().generate(), writes bundle via .save()
inspect: reads manifest.json and prints human-readable summary — recipe, seed, mode, difficulty, motif family, all tables with row counts, task splits, metadata presence
validate: checks manifest presence, required files (dataset_card.md, feature_dictionary.csv), table row counts vs manifest, SHA-256 hash integrity, task split file presence/counts/hashes, FK constraint integrity across all 10 relationships, leakage detection (unexpected columns in task splits)
22 CLI tests: smoke tests, generate integration (both modes), inspect output assertions, validate pass/fail/corrupt-manifest/missing-table scenarios
Completes v0.4.0 (Milestones 7–10)

Test plan

22 CLI tests pass (generate, inspect, validate)
Full suite: 562 tests passing
Ruff lint + format clean

🤖 Generated with Claude Code

- generate: parses all flags, calls Generator.from_recipe().generate(), writes bundle via .save(); supports --override, --difficulty, population size overrides - inspect: reads manifest.json, prints summary (recipe, seed, mode, tables with row counts, task splits, metadata presence) - validate: checks manifest presence, required files, table row counts, SHA-256 hashes, task split integrity, FK constraints, leakage detection (unexpected columns in task splits) - 22 CLI tests (smoke, generate integration for both modes, inspect output assertions, validate pass/fail/corrupt/missing-file scenarios) - 562 total tests passing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Pass n_accounts/n_contacts/n_leads as explicit keyword args to gen.generate() instead of **dict unpacking, which mypy could not reconcile with the typed signature. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Implements the Milestone 10 CLI surface area by wiring generate, inspect, and validate to real bundle I/O, plus integration tests covering the new commands.

Changes:

Implement leadforge generate end-to-end (parse options → generate bundle → save to disk).
Add inspect summary output based on manifest.json.
Add validate checks for required artifacts, row counts, hashes, FK integrity, and leakage; expand CLI test suite accordingly.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`leadforge/cli/commands/generate.py`	Wires CLI flags to `Generator.from_recipe(...).generate()` and `bundle.save(out)`.
`leadforge/cli/commands/inspect.py`	Reads `manifest.json` and prints a human-readable bundle summary.
`leadforge/cli/commands/validate.py`	Adds on-disk bundle validation: files present, hashes/row counts, FK checks, leakage checks.
`tests/test_cli.py`	Replaces stubs with integration tests for generate/inspect/validate behavior.
`.agent-plan.md`	Updates milestone status/plan notes to reflect M10 completion and M11 next steps.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…sha256 - COPILOT-1/4: inspect.py and validate.py now use load_json() from core.serialization with LeadforgeError → clean typer.Exit(1) - COPILOT-2/3: confirmed Typer CliRunner always mixes stderr into result.output; assertions are already correct (resolved as-is) - COPILOT-5: opened #17 for Parquet metadata optimization (out-of-scope for v1 bundle sizes) - COPILOT-6: moved _sha256 to core.hashing.file_sha256(); both render/manifests.py and cli/commands/validate.py now import from there Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…pand tests Addresses all 13 items from self-review: generate.py: 1. Remove double population-count pass (was in both from_recipe and generate) 2. Wrap from_recipe/generate/load_yaml with try/except → clean CLI errors 3. Validate --override path exists before loading inspect.py: 4. Defensive manifest parsing — handle non-dict tables/tasks entries 5. Check bundle_path is a directory, not a file validate.py → validation/bundle_checks.py: 6. FK checks now report "FK check skipped" when a table is missing instead of silently passing 7. Leakage check no longer re-reads train.parquet (reads schema only) 8. Leakage check now covers all splits (train, valid, test) 13. Extract all validation logic to validation/bundle_checks.py; CLI command is now a thin shell tests: 9. Add warning comment to module-scoped bundle_dir fixture 10. Add tests for --override (valid file + missing file) 11. Add test for --difficulty flag 12. Consolidate 5 inspect invocations into 1 test with multiple asserts + Add file-instead-of-dir tests for inspect and validate + Assert FK-skip message in missing-table validate test + New test_bundle_checks.py with 5 unit tests for validation module 568 tests passing; ruff + mypy clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- generate: reject --override files that don't contain a YAML mapping - bundle_checks: guard tables/tasks manifest fields with isinstance checks; report "Malformed manifest" instead of crashing on non-dict Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-04-29T07:16:18Z

pr-agent-context report:

No unresolved review comments, failing checks, or actionable patch coverage gaps were found on PR
#16. Treat this PR as all clear unless new signals appear.

Run metadata:

Tool ref: v4
Tool version: 4.0.20
Trigger: commit pushed
Workflow run: 25095854426 attempt 1
Comment timestamp: 2026-04-29T07:15:27.386511+00:00
PR head commit: 99ff9ecd9c4e95a911ab7b986e05e424f07d6711

Copilot

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 7 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-29T07:21:22Z

+        child_vals = set(child_df[fk.child_column].dropna())
+        parent_vals = set(parent_df[fk.parent_column].dropna())
+        orphans = child_vals - parent_vals
+        if orphans:
+            sample = list(orphans)[:3]
+            errors.append(
+                f"FK violation: {fk.child_table}.{fk.child_column} → "
+                f"{fk.parent_table}.{fk.parent_column}: "
+                f"{len(orphans)} orphan(s), e.g. {sample}"


FK validation builds Python set(...)s from entire FK/PK columns (dropna()), which can be very memory-heavy for large bundles. Consider using vectorized membership (e.g., child_series[~child_series.isin(parent_series)] with parent_series.unique()) or chunking to keep peak memory bounded.

Suggested change

child_vals = set(child_df[fk.child_column].dropna())

parent_vals = set(parent_df[fk.parent_column].dropna())

orphans = child_vals - parent_vals

if orphans:

sample = list(orphans)[:3]

errors.append(

f"FK violation: {fk.child_table}.{fk.child_column} → "

f"{fk.parent_table}.{fk.parent_column}: "

f"{len(orphans)} orphan(s), e.g. {sample}"

child_series = child_df[fk.child_column].dropna()

parent_values = parent_df[fk.parent_column].dropna().unique()

orphan_rows = child_series[~child_series.isin(parent_values)]

orphan_values = orphan_rows.unique()

if len(orphan_values) > 0:

sample = orphan_values[:3].tolist()

errors.append(

f"FK violation: {fk.child_table}.{fk.child_column} → "

f"{fk.parent_table}.{fk.parent_column}: "

f"{len(orphan_values)} orphan(s), e.g. {sample}"

Copilot · 2026-04-29T07:21:22Z

+    if isinstance(tables, dict):
+        for name, info in tables.items():
+            row_count = _safe_get(info, "row_count", "?")
+            typer.echo(f"  {name:25s}  {row_count:>8} rows")


row_count = _safe_get(info, 'row_count', '?') may return None (or another non-string) if the manifest is malformed, and f"{row_count:>8}" will raise TypeError for values like None. To keep inspect robust against partially-corrupt manifests, coerce to string before formatting (e.g., f"{str(row_count):>8}") or guard None explicitly.

Suggested change

typer.echo(f" {name:25s} {row_count:>8} rows")

typer.echo(f" {name:25s} {str(row_count):>8} rows")

Copilot · 2026-04-29T07:21:22Z

+            split_path = root / f"tasks/{task_id}/{split}.parquet"
+            if split_path.exists():
+                actual_columns = set(pd.read_parquet(split_path, columns=[]).columns)
+                extra = actual_columns - expected_columns
+                if extra:
+                    errors.append(


Leakage detection behavior isn’t covered by tests yet (e.g., a task split Parquet containing an extra column should trigger a validation error). Adding a targeted test case would help prevent regressions once the column-detection logic is fixed.

Copilot · 2026-04-29T07:21:22Z

+            split_path = root / f"tasks/{task_id}/{split}.parquet"
+            if split_path.exists():
+                actual_columns = set(pd.read_parquet(split_path, columns=[]).columns)
+                extra = actual_columns - expected_columns


Leakage detection is currently ineffective: pd.read_parquet(split_path, columns=[]) returns a DataFrame with zero columns, so actual_columns will always be empty and extra will never be detected. Read the Parquet schema/metadata to get column names (e.g., via pyarrow.parquet.ParquetFile(...).schema.names) or read the file without columns=[]/with nrows=0-style approach if supported.

Copilot · 2026-04-29T07:21:23Z

+    manifest = load_json(bundle_root / "manifest.json")
+    errors: list[str] = []
+    errors.extend(_check_required_files(bundle_root))
+    tables, table_errors = _check_tables(bundle_root, manifest)
+    errors.extend(table_errors)


validate_bundle assumes load_json() returns a dict; if manifest.json parses to a non-object (e.g., list/str), this will raise at manifest.get(...) and crash the CLI (it only catches LeadforgeError). Consider validating isinstance(manifest, dict) up front and returning a clear error like "Malformed manifest: root is not a JSON object".

Copilot · 2026-04-29T07:21:23Z

+            errors.append(f"Missing table file: {rel_path}")
+            continue
+
+        df = pd.read_parquet(abs_path)


pd.read_parquet(abs_path) can raise (corrupt/unsupported parquet, IO errors). Right now that would abort validation instead of returning an error list. Wrap the read in a try/except and append an error like "Failed to read table parquet: ..." so validate_bundle can report failures gracefully.

Suggested change

df = pd.read_parquet(abs_path)

try:

df = pd.read_parquet(abs_path)

except Exception as exc:

errors.append(f"Failed to read table parquet {rel_path}: {exc}")

continue

Copilot · 2026-04-29T07:21:23Z

+                errors.append(f"Missing task file: {rel_path}")
+                continue
+
+            df = pd.read_parquet(abs_path)


Same issue for task splits: pd.read_parquet(abs_path) may raise and currently would crash validation. Consider catching exceptions per split and recording an error so a single bad split file doesn't prevent reporting other problems.

Suggested change

df = pd.read_parquet(abs_path)

try:

df = pd.read_parquet(abs_path)

except Exception as exc:

errors.append(f"Task {task_id}/{split}: failed to read parquet file: {exc}")

continue

Fold the brutal self-review's findings back into the PR before review. Bugs: - (#1) run_packager validate→write order — both packagers wrote README/metadata on validation failure, leaving corrupt artifacts on disk that would silently get committed. Gated on `errors == ()`; added no-write tests for both packagers. - (#2) Instructor README inlined the public 3-tier README into a 1-tier dataset card. Replaced with a dedicated `INSTRUCTOR_BODY` constant that links to the public dataset and describes only the instructor-specific additions (full-horizon tables, hidden DAG, latent registry, mechanism summary). - (#3) validate_upload_dir_safe also blocks strict descendants of release_dir; `--huggingface-dir release/intro` would otherwise rmtree the intro bundle. Architecture: - (#5) Finished shared-primitives extraction: SOURCE_TREE_BLOCK, validate_readme_substitution, replace_file, replace_dir, load_manifest now live in scripts/_release_common.py. Both packagers reduced to imports. - (#6) Replaced 60-line hand-rolled YAML renderer with yaml.safe_dump + a 4-line _IndentedDumper subclass. - (#7) Removed dead --owner / --dataset-slug CLI flags. - (#8) assemble_upload_dir now takes rendered_readme and writes it. - (#9) build_config_for_tier made pure (no I/O); cheap manifest-stat preflight via _assert_tier_dir_exists. - (#10) --default-config with --variant=instructor errors loudly. CI: - (#4) Added [publish] extra (datasets>=2.14, kaggle>=1.6) so the gated G12.3 / G12.4 / G11.3 tests install in one line. Cleanups: visual cruft (#13–#16), test cruft (#17 — unused tmp_path, dead tag_lines), em-dash YAML round-trip parametrised for the instructor pretty_name. Verification: 1223 tests pass + 5 gated skips; ruff + mypy clean; hash determinism PASS 67/67; leakage probes 0/3 reconstruct on every tier; validate_release_candidate --no-rebuild exits 0. release/{kaggle,huggingface,huggingface-instructor}/dataset-metadata .json|README.md regenerated; audit-artifact-sync tests guard them. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* PR 5.2: HuggingFace release packager + load_dataset smoke test Add `scripts/package_hf_release.py` to generate `release/huggingface/README.md` with G12.1-compliant YAML frontmatter (pretty_name, license, language, task_categories, size_categories, tags, three configs with `default: true` on intermediate per G12.2), inlining the rewritten `release/README.md` body with HF-specific link rewrites. `--variant=instructor` packages the companion repo (G12.4) from `release/intermediate_instructor/` into a separate `release/huggingface-instructor/` upload tree. G12.3 covered by a parametrised `load_dataset()` smoke test gated on the optional `datasets` SDK. Extract shared release-packaging primitives (link rewriter, dir-safety guard, cover-image validator) into `scripts/_release_common.py`; refactor the Kaggle packager to import them. `release/kaggle/dataset-metadata.json` is byte-stable across the refactor. Delete the legacy `release/HF_DATASET_CARD.md` stub — superseded by the generated card. Gitignore `release/huggingface{,-instructor}/*` except the committed README. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * PR 5.2 self-review fixes (Kaggle + HF packagers) Fold the brutal self-review's findings back into the PR before review. Bugs: - (#1) run_packager validate→write order — both packagers wrote README/metadata on validation failure, leaving corrupt artifacts on disk that would silently get committed. Gated on `errors == ()`; added no-write tests for both packagers. - (#2) Instructor README inlined the public 3-tier README into a 1-tier dataset card. Replaced with a dedicated `INSTRUCTOR_BODY` constant that links to the public dataset and describes only the instructor-specific additions (full-horizon tables, hidden DAG, latent registry, mechanism summary). - (#3) validate_upload_dir_safe also blocks strict descendants of release_dir; `--huggingface-dir release/intro` would otherwise rmtree the intro bundle. Architecture: - (#5) Finished shared-primitives extraction: SOURCE_TREE_BLOCK, validate_readme_substitution, replace_file, replace_dir, load_manifest now live in scripts/_release_common.py. Both packagers reduced to imports. - (#6) Replaced 60-line hand-rolled YAML renderer with yaml.safe_dump + a 4-line _IndentedDumper subclass. - (#7) Removed dead --owner / --dataset-slug CLI flags. - (#8) assemble_upload_dir now takes rendered_readme and writes it. - (#9) build_config_for_tier made pure (no I/O); cheap manifest-stat preflight via _assert_tier_dir_exists. - (#10) --default-config with --variant=instructor errors loudly. CI: - (#4) Added [publish] extra (datasets>=2.14, kaggle>=1.6) so the gated G12.3 / G12.4 / G11.3 tests install in one line. Cleanups: visual cruft (#13–#16), test cruft (#17 — unused tmp_path, dead tag_lines), em-dash YAML round-trip parametrised for the instructor pretty_name. Verification: 1223 tests pass + 5 gated skips; ruff + mypy clean; hash determinism PASS 67/67; leakage probes 0/3 reconstruct on every tier; validate_release_candidate --no-rebuild exits 0. release/{kaggle,huggingface,huggingface-instructor}/dataset-metadata .json|README.md regenerated; audit-artifact-sync tests guard them. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * PR 5.2 Copilot-review fixes (Kaggle + HF packagers) Fold Copilot's two real findings on the self-review revision back in. COPILOT-1 — validate_upload_dir_safe was only invoked inside assemble_upload_dir, which --dry-run skips. A dry-run with --huggingface-dir release (or .) would write the README into the unsafe path BEFORE the safety net fired. Hoist the check into run_packager (both packagers) so it runs before any mkdir or write; the inner assemble_upload_dir call stays as defence-in-depth for direct callers. New tests: dry-run with unsafe upload-dir raises without writing; the same path through main() returns rc=2. COPILOT-2 — Cover-image path resolution was inconsistent: validate_cover_image used cover_image as passed, while assemble_upload_dir did a separate ``release_dir / cover_image.name`` fallback. Diverged for bare-basename inputs (false validation failures) and two-paths-sharing-a-basename (assembler shadowing the explicit path). Added resolve_cover_image_path() to _release_common.py (explicit-wins, release-dir fallback); run_packager calls it once and threads the resolved path through validation, the metadata's image field, and assembly. New tests/scripts/test_release_common.py covers the four resolution branches; new packager-side tests confirm bare-basename success + metadata field plumbing. COPILOT-3 — outdated; already addressed by self-review fix #8 in commit f2fc4a2. Resolved as already treated; no code change. Verification: 1232/1232 tests pass + 5 gated skips; ruff + mypy clean; hash determinism PASS 67/67; leakage probes rc=0 on every tier; validate_release_candidate --no-rebuild exits 0; BUNDLE_SCHEMA_VERSION unchanged at 5. release/{kaggle,huggingface,huggingface-instructor}/* artifacts regenerated byte-identically. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings April 29, 2026 05:58

shaypal5 added this to the v0.4.0 — Polished relational output + task export milestone Apr 29, 2026

shaypal5 added type: feature New capability layer: cli cli/ command-line interface labels Apr 29, 2026

Copilot started reviewing on behalf of shaypal5 April 29, 2026 05:58 View session

This comment has been minimized.

Sign in to view

fix: resolve mypy arg-type error in generate command

4fe7c7e

Pass n_accounts/n_contacts/n_leads as explicit keyword args to gen.generate() instead of **dict unpacking, which mypy could not reconcile with the typed signature. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot AI reviewed Apr 29, 2026

View reviewed changes

Comment thread leadforge/cli/commands/inspect.py

Comment thread tests/test_cli.py

Comment thread tests/test_cli.py

Comment thread leadforge/cli/commands/validate.py Outdated

Comment thread leadforge/cli/commands/validate.py Outdated

Comment thread leadforge/cli/commands/validate.py Outdated

This comment has been minimized.

Sign in to view

shaypal5 mentioned this pull request Apr 29, 2026

perf: use Parquet metadata for row counts in validate command #17

Closed

Copilot AI review requested due to automatic review settings April 29, 2026 07:01

Copilot started reviewing on behalf of shaypal5 April 29, 2026 07:01 View session

This comment has been minimized.

Sign in to view

Copilot AI reviewed Apr 29, 2026

View reviewed changes

This comment has been minimized.

Sign in to view

Copilot AI review requested due to automatic review settings April 29, 2026 07:15

Copilot started reviewing on behalf of shaypal5 April 29, 2026 07:15 View session

shaypal5 merged commit 301d773 into main Apr 29, 2026
7 checks passed

shaypal5 deleted the feat/milestone-10-cli-commands branch April 29, 2026 07:16

Copilot AI reviewed Apr 29, 2026

View reviewed changes

shaypal5 mentioned this pull request May 6, 2026

PR 5.2: HuggingFace release packager + load_dataset smoke test #72

Merged

4 tasks

	typer.echo(f" {name:25s} {row_count:>8} rows")
	typer.echo(f" {name:25s} {str(row_count):>8} rows")

Conversation

shaypal5 commented Apr 29, 2026

Summary

Test plan

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

This comment has been minimized.

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

github-actions Bot commented Apr 29, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants