Skip to content

feat: Milestone 10 — CLI generate/inspect/validate commands#16

Merged
shaypal5 merged 5 commits into
mainfrom
feat/milestone-10-cli-commands
Apr 29, 2026
Merged

feat: Milestone 10 — CLI generate/inspect/validate commands#16
shaypal5 merged 5 commits into
mainfrom
feat/milestone-10-cli-commands

Conversation

@shaypal5

Copy link
Copy Markdown
Contributor

Summary

  • generate: fully wired end-to-end — parses all CLI flags (--recipe, --seed, --mode, --out, --difficulty, --n-accounts/contacts/leads, --horizon-days, --override), calls Generator.from_recipe().generate(), writes bundle via .save()
  • inspect: reads manifest.json and prints human-readable summary — recipe, seed, mode, difficulty, motif family, all tables with row counts, task splits, metadata presence
  • validate: checks manifest presence, required files (dataset_card.md, feature_dictionary.csv), table row counts vs manifest, SHA-256 hash integrity, task split file presence/counts/hashes, FK constraint integrity across all 10 relationships, leakage detection (unexpected columns in task splits)
  • 22 CLI tests: smoke tests, generate integration (both modes), inspect output assertions, validate pass/fail/corrupt-manifest/missing-table scenarios
  • Completes v0.4.0 (Milestones 7–10)

Test plan

  • 22 CLI tests pass (generate, inspect, validate)
  • Full suite: 562 tests passing
  • Ruff lint + format clean

🤖 Generated with Claude Code

- generate: parses all flags, calls Generator.from_recipe().generate(),
  writes bundle via .save(); supports --override, --difficulty, population
  size overrides
- inspect: reads manifest.json, prints summary (recipe, seed, mode,
  tables with row counts, task splits, metadata presence)
- validate: checks manifest presence, required files, table row counts,
  SHA-256 hashes, task split integrity, FK constraints, leakage detection
  (unexpected columns in task splits)
- 22 CLI tests (smoke, generate integration for both modes, inspect
  output assertions, validate pass/fail/corrupt/missing-file scenarios)
- 562 total tests passing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 29, 2026 05:58
@shaypal5 shaypal5 added type: feature New capability layer: cli cli/ command-line interface labels Apr 29, 2026
@github-actions

This comment has been minimized.

Pass n_accounts/n_contacts/n_leads as explicit keyword args to
gen.generate() instead of **dict unpacking, which mypy could not
reconcile with the typed signature.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements the Milestone 10 CLI surface area by wiring generate, inspect, and validate to real bundle I/O, plus integration tests covering the new commands.

Changes:

  • Implement leadforge generate end-to-end (parse options → generate bundle → save to disk).
  • Add inspect summary output based on manifest.json.
  • Add validate checks for required artifacts, row counts, hashes, FK integrity, and leakage; expand CLI test suite accordingly.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
leadforge/cli/commands/generate.py Wires CLI flags to Generator.from_recipe(...).generate() and bundle.save(out).
leadforge/cli/commands/inspect.py Reads manifest.json and prints a human-readable bundle summary.
leadforge/cli/commands/validate.py Adds on-disk bundle validation: files present, hashes/row counts, FK checks, leakage checks.
tests/test_cli.py Replaces stubs with integration tests for generate/inspect/validate behavior.
.agent-plan.md Updates milestone status/plan notes to reflect M10 completion and M11 next steps.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread leadforge/cli/commands/inspect.py
Comment thread tests/test_cli.py
Comment thread tests/test_cli.py
Comment thread leadforge/cli/commands/validate.py Outdated
Comment thread leadforge/cli/commands/validate.py Outdated
Comment thread leadforge/cli/commands/validate.py Outdated
@github-actions

This comment has been minimized.

…sha256

- COPILOT-1/4: inspect.py and validate.py now use load_json() from
  core.serialization with LeadforgeError → clean typer.Exit(1)
- COPILOT-2/3: confirmed Typer CliRunner always mixes stderr into
  result.output; assertions are already correct (resolved as-is)
- COPILOT-5: opened #17 for Parquet metadata optimization (out-of-scope
  for v1 bundle sizes)
- COPILOT-6: moved _sha256 to core.hashing.file_sha256(); both
  render/manifests.py and cli/commands/validate.py now import from there

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 29, 2026 07:01
@github-actions

This comment has been minimized.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread leadforge/cli/commands/validate.py Outdated
Comment thread leadforge/cli/commands/validate.py Outdated
Comment thread leadforge/cli/commands/generate.py Outdated
Comment thread leadforge/cli/commands/inspect.py Outdated
Comment thread leadforge/cli/commands/validate.py Outdated
Comment thread leadforge/cli/commands/validate.py Outdated
…pand tests

Addresses all 13 items from self-review:

generate.py:
 1. Remove double population-count pass (was in both from_recipe and generate)
 2. Wrap from_recipe/generate/load_yaml with try/except → clean CLI errors
 3. Validate --override path exists before loading

inspect.py:
 4. Defensive manifest parsing — handle non-dict tables/tasks entries
 5. Check bundle_path is a directory, not a file

validate.py → validation/bundle_checks.py:
 6. FK checks now report "FK check skipped" when a table is missing
    instead of silently passing
 7. Leakage check no longer re-reads train.parquet (reads schema only)
 8. Leakage check now covers all splits (train, valid, test)
13. Extract all validation logic to validation/bundle_checks.py;
    CLI command is now a thin shell

tests:
 9. Add warning comment to module-scoped bundle_dir fixture
10. Add tests for --override (valid file + missing file)
11. Add test for --difficulty flag
12. Consolidate 5 inspect invocations into 1 test with multiple asserts
 +  Add file-instead-of-dir tests for inspect and validate
 +  Assert FK-skip message in missing-table validate test
 +  New test_bundle_checks.py with 5 unit tests for validation module

568 tests passing; ruff + mypy clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions

This comment has been minimized.

- generate: reject --override files that don't contain a YAML mapping
- bundle_checks: guard tables/tasks manifest fields with isinstance
  checks; report "Malformed manifest" instead of crashing on non-dict

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 29, 2026 07:15
@github-actions

Copy link
Copy Markdown

pr-agent-context report:

No unresolved review comments, failing checks, or actionable patch coverage gaps were found on PR
#16. Treat this PR as all clear unless new signals appear.

Run metadata:

Tool ref: v4
Tool version: 4.0.20
Trigger: commit pushed
Workflow run: 25095854426 attempt 1
Comment timestamp: 2026-04-29T07:15:27.386511+00:00
PR head commit: 99ff9ecd9c4e95a911ab7b986e05e424f07d6711

@shaypal5 shaypal5 merged commit 301d773 into main Apr 29, 2026
7 checks passed
@shaypal5 shaypal5 deleted the feat/milestone-10-cli-commands branch April 29, 2026 07:16

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 7 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +137 to +145
child_vals = set(child_df[fk.child_column].dropna())
parent_vals = set(parent_df[fk.parent_column].dropna())
orphans = child_vals - parent_vals
if orphans:
sample = list(orphans)[:3]
errors.append(
f"FK violation: {fk.child_table}.{fk.child_column} → "
f"{fk.parent_table}.{fk.parent_column}: "
f"{len(orphans)} orphan(s), e.g. {sample}"

Copilot AI Apr 29, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FK validation builds Python set(...)s from entire FK/PK columns (dropna()), which can be very memory-heavy for large bundles. Consider using vectorized membership (e.g., child_series[~child_series.isin(parent_series)] with parent_series.unique()) or chunking to keep peak memory bounded.

Suggested change
child_vals = set(child_df[fk.child_column].dropna())
parent_vals = set(parent_df[fk.parent_column].dropna())
orphans = child_vals - parent_vals
if orphans:
sample = list(orphans)[:3]
errors.append(
f"FK violation: {fk.child_table}.{fk.child_column} → "
f"{fk.parent_table}.{fk.parent_column}: "
f"{len(orphans)} orphan(s), e.g. {sample}"
child_series = child_df[fk.child_column].dropna()
parent_values = parent_df[fk.parent_column].dropna().unique()
orphan_rows = child_series[~child_series.isin(parent_values)]
orphan_values = orphan_rows.unique()
if len(orphan_values) > 0:
sample = orphan_values[:3].tolist()
errors.append(
f"FK violation: {fk.child_table}.{fk.child_column} → "
f"{fk.parent_table}.{fk.parent_column}: "
f"{len(orphan_values)} orphan(s), e.g. {sample}"

Copilot uses AI. Check for mistakes.
if isinstance(tables, dict):
for name, info in tables.items():
row_count = _safe_get(info, "row_count", "?")
typer.echo(f" {name:25s} {row_count:>8} rows")

Copilot AI Apr 29, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

row_count = _safe_get(info, 'row_count', '?') may return None (or another non-string) if the manifest is malformed, and f"{row_count:>8}" will raise TypeError for values like None. To keep inspect robust against partially-corrupt manifests, coerce to string before formatting (e.g., f"{str(row_count):>8}") or guard None explicitly.

Suggested change
typer.echo(f" {name:25s} {row_count:>8} rows")
typer.echo(f" {name:25s} {str(row_count):>8} rows")

Copilot uses AI. Check for mistakes.
Comment on lines +160 to +165
split_path = root / f"tasks/{task_id}/{split}.parquet"
if split_path.exists():
actual_columns = set(pd.read_parquet(split_path, columns=[]).columns)
extra = actual_columns - expected_columns
if extra:
errors.append(

Copilot AI Apr 29, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leakage detection behavior isn’t covered by tests yet (e.g., a task split Parquet containing an extra column should trigger a validation error). Adding a targeted test case would help prevent regressions once the column-detection logic is fixed.

Copilot uses AI. Check for mistakes.
Comment on lines +160 to +163
split_path = root / f"tasks/{task_id}/{split}.parquet"
if split_path.exists():
actual_columns = set(pd.read_parquet(split_path, columns=[]).columns)
extra = actual_columns - expected_columns

Copilot AI Apr 29, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leakage detection is currently ineffective: pd.read_parquet(split_path, columns=[]) returns a DataFrame with zero columns, so actual_columns will always be empty and extra will never be detected. Read the Parquet schema/metadata to get column names (e.g., via pyarrow.parquet.ParquetFile(...).schema.names) or read the file without columns=[]/with nrows=0-style approach if supported.

Copilot uses AI. Check for mistakes.
Comment on lines +32 to +36
manifest = load_json(bundle_root / "manifest.json")
errors: list[str] = []
errors.extend(_check_required_files(bundle_root))
tables, table_errors = _check_tables(bundle_root, manifest)
errors.extend(table_errors)

Copilot AI Apr 29, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate_bundle assumes load_json() returns a dict; if manifest.json parses to a non-object (e.g., list/str), this will raise at manifest.get(...) and crash the CLI (it only catches LeadforgeError). Consider validating isinstance(manifest, dict) up front and returning a clear error like "Malformed manifest: root is not a JSON object".

Copilot uses AI. Check for mistakes.
errors.append(f"Missing table file: {rel_path}")
continue

df = pd.read_parquet(abs_path)

Copilot AI Apr 29, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pd.read_parquet(abs_path) can raise (corrupt/unsupported parquet, IO errors). Right now that would abort validation instead of returning an error list. Wrap the read in a try/except and append an error like "Failed to read table parquet: ..." so validate_bundle can report failures gracefully.

Suggested change
df = pd.read_parquet(abs_path)
try:
df = pd.read_parquet(abs_path)
except Exception as exc:
errors.append(f"Failed to read table parquet {rel_path}: {exc}")
continue

Copilot uses AI. Check for mistakes.
errors.append(f"Missing task file: {rel_path}")
continue

df = pd.read_parquet(abs_path)

Copilot AI Apr 29, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue for task splits: pd.read_parquet(abs_path) may raise and currently would crash validation. Consider catching exceptions per split and recording an error so a single bad split file doesn't prevent reporting other problems.

Suggested change
df = pd.read_parquet(abs_path)
try:
df = pd.read_parquet(abs_path)
except Exception as exc:
errors.append(f"Task {task_id}/{split}: failed to read parquet file: {exc}")
continue

Copilot uses AI. Check for mistakes.
shaypal5 added a commit that referenced this pull request May 6, 2026
Fold the brutal self-review's findings back into the PR before review.

Bugs:
- (#1) run_packager validate→write order — both packagers wrote
  README/metadata on validation failure, leaving corrupt artifacts on
  disk that would silently get committed.  Gated on `errors == ()`;
  added no-write tests for both packagers.
- (#2) Instructor README inlined the public 3-tier README into a
  1-tier dataset card.  Replaced with a dedicated `INSTRUCTOR_BODY`
  constant that links to the public dataset and describes only the
  instructor-specific additions (full-horizon tables, hidden DAG,
  latent registry, mechanism summary).
- (#3) validate_upload_dir_safe also blocks strict descendants of
  release_dir; `--huggingface-dir release/intro` would otherwise
  rmtree the intro bundle.

Architecture:
- (#5) Finished shared-primitives extraction: SOURCE_TREE_BLOCK,
  validate_readme_substitution, replace_file, replace_dir,
  load_manifest now live in scripts/_release_common.py.  Both
  packagers reduced to imports.
- (#6) Replaced 60-line hand-rolled YAML renderer with yaml.safe_dump
  + a 4-line _IndentedDumper subclass.
- (#7) Removed dead --owner / --dataset-slug CLI flags.
- (#8) assemble_upload_dir now takes rendered_readme and writes it.
- (#9) build_config_for_tier made pure (no I/O); cheap manifest-stat
  preflight via _assert_tier_dir_exists.
- (#10) --default-config with --variant=instructor errors loudly.

CI:
- (#4) Added [publish] extra (datasets>=2.14, kaggle>=1.6) so the
  gated G12.3 / G12.4 / G11.3 tests install in one line.

Cleanups: visual cruft (#13#16), test cruft (#17 — unused tmp_path,
dead tag_lines), em-dash YAML round-trip parametrised for the
instructor pretty_name.

Verification: 1223 tests pass + 5 gated skips; ruff + mypy clean;
hash determinism PASS 67/67; leakage probes 0/3 reconstruct on every
tier; validate_release_candidate --no-rebuild exits 0.
release/{kaggle,huggingface,huggingface-instructor}/dataset-metadata
.json|README.md regenerated; audit-artifact-sync tests guard them.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
shaypal5 added a commit that referenced this pull request May 6, 2026
* PR 5.2: HuggingFace release packager + load_dataset smoke test

Add `scripts/package_hf_release.py` to generate `release/huggingface/README.md`
with G12.1-compliant YAML frontmatter (pretty_name, license, language,
task_categories, size_categories, tags, three configs with `default: true`
on intermediate per G12.2), inlining the rewritten `release/README.md`
body with HF-specific link rewrites.  `--variant=instructor` packages the
companion repo (G12.4) from `release/intermediate_instructor/` into a
separate `release/huggingface-instructor/` upload tree.  G12.3 covered
by a parametrised `load_dataset()` smoke test gated on the optional
`datasets` SDK.

Extract shared release-packaging primitives (link rewriter, dir-safety
guard, cover-image validator) into `scripts/_release_common.py`; refactor
the Kaggle packager to import them.  `release/kaggle/dataset-metadata.json`
is byte-stable across the refactor.

Delete the legacy `release/HF_DATASET_CARD.md` stub — superseded by the
generated card.  Gitignore `release/huggingface{,-instructor}/*` except
the committed README.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* PR 5.2 self-review fixes (Kaggle + HF packagers)

Fold the brutal self-review's findings back into the PR before review.

Bugs:
- (#1) run_packager validate→write order — both packagers wrote
  README/metadata on validation failure, leaving corrupt artifacts on
  disk that would silently get committed.  Gated on `errors == ()`;
  added no-write tests for both packagers.
- (#2) Instructor README inlined the public 3-tier README into a
  1-tier dataset card.  Replaced with a dedicated `INSTRUCTOR_BODY`
  constant that links to the public dataset and describes only the
  instructor-specific additions (full-horizon tables, hidden DAG,
  latent registry, mechanism summary).
- (#3) validate_upload_dir_safe also blocks strict descendants of
  release_dir; `--huggingface-dir release/intro` would otherwise
  rmtree the intro bundle.

Architecture:
- (#5) Finished shared-primitives extraction: SOURCE_TREE_BLOCK,
  validate_readme_substitution, replace_file, replace_dir,
  load_manifest now live in scripts/_release_common.py.  Both
  packagers reduced to imports.
- (#6) Replaced 60-line hand-rolled YAML renderer with yaml.safe_dump
  + a 4-line _IndentedDumper subclass.
- (#7) Removed dead --owner / --dataset-slug CLI flags.
- (#8) assemble_upload_dir now takes rendered_readme and writes it.
- (#9) build_config_for_tier made pure (no I/O); cheap manifest-stat
  preflight via _assert_tier_dir_exists.
- (#10) --default-config with --variant=instructor errors loudly.

CI:
- (#4) Added [publish] extra (datasets>=2.14, kaggle>=1.6) so the
  gated G12.3 / G12.4 / G11.3 tests install in one line.

Cleanups: visual cruft (#13#16), test cruft (#17 — unused tmp_path,
dead tag_lines), em-dash YAML round-trip parametrised for the
instructor pretty_name.

Verification: 1223 tests pass + 5 gated skips; ruff + mypy clean;
hash determinism PASS 67/67; leakage probes 0/3 reconstruct on every
tier; validate_release_candidate --no-rebuild exits 0.
release/{kaggle,huggingface,huggingface-instructor}/dataset-metadata
.json|README.md regenerated; audit-artifact-sync tests guard them.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* PR 5.2 Copilot-review fixes (Kaggle + HF packagers)

Fold Copilot's two real findings on the self-review revision back in.

COPILOT-1 — validate_upload_dir_safe was only invoked inside
assemble_upload_dir, which --dry-run skips.  A dry-run with
--huggingface-dir release (or .) would write the README into the
unsafe path BEFORE the safety net fired.  Hoist the check into
run_packager (both packagers) so it runs before any mkdir or write;
the inner assemble_upload_dir call stays as defence-in-depth for
direct callers.  New tests: dry-run with unsafe upload-dir raises
without writing; the same path through main() returns rc=2.

COPILOT-2 — Cover-image path resolution was inconsistent:
validate_cover_image used cover_image as passed, while
assemble_upload_dir did a separate ``release_dir / cover_image.name``
fallback.  Diverged for bare-basename inputs (false validation
failures) and two-paths-sharing-a-basename (assembler shadowing the
explicit path).  Added resolve_cover_image_path() to
_release_common.py (explicit-wins, release-dir fallback);
run_packager calls it once and threads the resolved path through
validation, the metadata's image field, and assembly.  New
tests/scripts/test_release_common.py covers the four resolution
branches; new packager-side tests confirm bare-basename success +
metadata field plumbing.

COPILOT-3 — outdated; already addressed by self-review fix #8 in
commit f2fc4a2.  Resolved as already treated; no code change.

Verification: 1232/1232 tests pass + 5 gated skips; ruff + mypy
clean; hash determinism PASS 67/67; leakage probes rc=0 on every
tier; validate_release_candidate --no-rebuild exits 0;
BUNDLE_SCHEMA_VERSION unchanged at 5.
release/{kaggle,huggingface,huggingface-instructor}/* artifacts
regenerated byte-identically.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

layer: cli cli/ command-line interface type: feature New capability

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants