Skip to content

feat: Milestone 4 — world structure layer (node types, graph, motifs, rewiring, sampler)#8

Merged
shaypal5 merged 5 commits into
mainfrom
feat/milestone-4-world-structure
Apr 25, 2026
Merged

feat: Milestone 4 — world structure layer (node types, graph, motifs, rewiring, sampler)#8
shaypal5 merged 5 commits into
mainfrom
feat/milestone-4-world-structure

Conversation

@shaypal5

Copy link
Copy Markdown
Contributor

Summary

Implements the hidden-world variability mechanism (§11 of architecture spec) — the layer that makes leadforge more than a fixed-DGP simulator.

  • structure/node_types.pyNodeType enum (9 semantic categories); ROOT_ELIGIBLE, REQUIRES_PARENT, LEAF_ONLY constraint sets
  • structure/graph.pyNodeSpec/EdgeSpec dataclasses + WorldGraph wrapping networkx.DiGraph; validates acyclicity, node-type legality, nondegeneracy, outcome reachability; exports to JSON and GraphML
  • structure/motifs.pyMotifFamily frozen dataclass; all 5 v1 families (fit_dominant, intent_dominant, sales_execution_sensitive, demo_trial_mediated, buying_committee_friction); get_motif_family() registry
  • structure/rewiring.pyrewire(motif, rng): optional-node dropping (p=0.4), edge-weight jitter (±0.15), optional latent-confounder injection (p=0.35); fully deterministic given seed
  • structure/sampler.pysample_hidden_graph(seed, motif_family_name=None): selects or pins motif family, applies rewiring, validates, retries up to 20 times
  • pyproject.toml — adds networkx>=3.2 and numpy>=1.26; mypy override for networkx

132 new tests covering node types, graph validation, all 5 motif families, rewiring invariants (determinism, required-node preservation, edge weight bounds, variability across seeds), and sampler property tests (30 seeds × all families). 327 total tests passing.

Test plan

  • ruff + mypy clean
  • All 327 tests pass
  • sample_hidden_graph(seed=42) is deterministic across repeated calls
  • Different seeds produce different motif families
  • All rewired graphs pass WorldGraph structural validation (property test: 20 seeds × 5 families × 10 seeds)
  • GraphML and JSON exports are parseable

🤖 Generated with Claude Code

… rewiring, sampler)

Implements the hidden-world variability mechanism (§11 of architecture spec):

leadforge/structure/node_types.py
- NodeType enum with 9 semantic categories (global_context, account_latent,
  contact_latent, lead_state, engagement_state, sales_process_state,
  observable_feature_source, outcome, post_conversion_state)
- ROOT_ELIGIBLE, REQUIRES_PARENT, LEAF_ONLY constraint sets

leadforge/structure/graph.py
- NodeSpec / EdgeSpec dataclasses for graph construction
- WorldGraph: wraps networkx.DiGraph; validates acyclicity, node-type legality
  (REQUIRES_PARENT, LEAF_ONLY), nondegeneracy, and outcome reachability
- Exports: to_dict(), to_json(), to_graphml()

leadforge/structure/motifs.py
- MotifFamily frozen dataclass (canonical_nodes, canonical_edges, optional_node_ids)
- All 5 v1 families: fit_dominant, intent_dominant, sales_execution_sensitive,
  demo_trial_mediated, buying_committee_friction
- get_motif_family() lookup; ALL_MOTIF_FAMILIES / MOTIF_FAMILY_NAMES registry

leadforge/structure/rewiring.py
- rewire(motif, rng): optional-node dropping (p=0.4), edge-weight jitter (±0.15),
  optional latent-confounder injection (p=0.35)
- All perturbations deterministic given seed; outcome nodes always retain a parent

leadforge/structure/sampler.py
- sample_hidden_graph(seed, motif_family_name=None): pins or randomly selects
  motif, applies rewiring, validates via WorldGraph, retries up to 20 times

pyproject.toml: add networkx>=3.2 and numpy>=1.26 as core deps; mypy override
for networkx.

132 new tests (node_types, graph, motifs, rewiring, sampler); 327 total passing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 22, 2026 14:03
@shaypal5 shaypal5 added type: feature New capability layer: structure structure/ motifs, graph, rewiring labels Apr 22, 2026
@github-actions

This comment has been minimized.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements the “world structure” layer for leadforge by introducing a typed hidden-world DAG (WorldGraph) seeded from motif families, plus stochastic rewiring and a deterministic sampler entry point.

Changes:

  • Add WorldGraph (networkx-backed) with structural validation and JSON/GraphML export.
  • Add v1 motif family registry + stochastic rewiring rules + sample_hidden_graph() sampler.
  • Add a new tests/structure/ suite covering node types, motifs, rewiring invariants, sampler properties, and exports.

Reviewed changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
leadforge/structure/node_types.py Adds NodeType enum and constraint sets for graph legality checks.
leadforge/structure/graph.py Introduces NodeSpec/EdgeSpec + WorldGraph validation and export helpers.
leadforge/structure/motifs.py Defines 5 canonical motif families and a name-based registry lookup.
leadforge/structure/rewiring.py Implements stochastic rewiring (optional drops, weight jitter, optional confounder).
leadforge/structure/sampler.py Implements deterministic sampling with retries and optional motif pinning.
tests/structure/test_node_types.py Adds unit tests for node type definitions/constraint sets.
tests/structure/test_graph.py Adds validation and export tests for WorldGraph.
tests/structure/test_motifs.py Adds motif registry + invariant tests across all families.
tests/structure/test_rewiring.py Adds determinism/variability and invariants tests for rewiring.
tests/structure/test_sampler.py Adds sampler determinism/property/export smoke tests.
tests/structure/__init__.py Introduces the new structure test package.
pyproject.toml Adds networkx/numpy dependencies and mypy override for networkx imports.
.agent-plan.md Updates milestone tracking notes (Milestone 4 complete / Milestone 5 next).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread leadforge/structure/graph.py
Comment thread leadforge/structure/graph.py
Comment thread leadforge/structure/graph.py
Comment thread leadforge/structure/sampler.py Outdated
Comment thread leadforge/structure/sampler.py
Comment thread leadforge/structure/sampler.py
Comment thread tests/structure/test_graph.py Outdated
Comment thread leadforge/structure/graph.py
Comment thread leadforge/structure/rewiring.py Outdated
Comment thread tests/structure/test_sampler.py
CI:
- node_types.py: replace str+Enum mixin with StrEnum (UP042, Python 3.11+)

graph.py:
- Guard reserved node attribute keys (node_type, label) in NodeSpec.metadata
  at WorldGraph construction; raise GraphValidationError on collision
- Guard reserved edge attribute key (weight) in EdgeSpec.metadata; callers
  must use EdgeSpec.weight field directly
- Add edge weight range validation ([-1, 1]) at construction time
- Include node/edge metadata in to_dict() / to_json() output

sampler.py:
- Fix module docstring: "pinned by name" not "deterministically from the recipe"
- Add seed validation: reject bool and negative ints (consistent with RNGRoot)
- Narrow retry exception catch from broad Exception to GraphValidationError so
  programmer errors (TypeError, KeyError) surface immediately

rewiring.py:
- Remove unimplemented "swapping one optional node for an alternate proxy"
  bullet from the permitted-variability docstring

tests:
- test_graph.py: rename test_unreachable_outcome_raises →
  test_outcome_reachable_from_different_root_passes; add reserved-key and
  weight-range validation tests
- test_sampler.py: add edge weight comparison to test_same_seed_same_graph;
  add test_bool_seed_raises and test_negative_seed_raises

Resolves COPILOT-1 through COPILOT-10 (COPILOT-3 resolved as irrelevant).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions

This comment has been minimized.

@shaypal5 shaypal5 left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trigger Copilot review

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds the “world structure” layer that generates a validated hidden causal DAG from motif families, with deterministic seed-based sampling and stochastic rewiring, plus tests and dependency updates.

Changes:

  • Introduces WorldGraph (NetworkX-backed DAG) with structural validation and JSON/GraphML export.
  • Defines NodeType constraints and 5 canonical MotifFamily templates, plus stochastic rewire() rules.
  • Adds sample_hidden_graph() entrypoint and comprehensive structure-layer tests; updates dependencies.

Reviewed changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
leadforge/structure/node_types.py Adds NodeType enum + constraint sets for graph validation.
leadforge/structure/graph.py Implements WorldGraph construction, validation, and export utilities.
leadforge/structure/motifs.py Defines 5 v1 motif families and a registry/lookup API.
leadforge/structure/rewiring.py Implements stochastic rewiring (optional drops, weight jitter, confounder injection).
leadforge/structure/sampler.py Implements deterministic motif selection + retry loop to sample a valid hidden graph.
tests/structure/test_node_types.py Tests NodeType definitions and constraint-set invariants.
tests/structure/test_graph.py Tests WorldGraph validation rules and export surfaces.
tests/structure/test_motifs.py Tests motif registry correctness and motif skeleton invariants.
tests/structure/test_rewiring.py Tests rewiring determinism, invariants, and variability across seeds.
tests/structure/test_sampler.py Tests sampler contract, determinism, pinning, and export smoke tests.
pyproject.toml Adds networkx/numpy deps and mypy override for networkx.
.agent-plan.md Updates milestone tracking notes to reflect completion of Milestone 4.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/structure/test_sampler.py
Comment thread leadforge/structure/graph.py Outdated
Comment thread leadforge/structure/graph.py Outdated
Comment thread leadforge/structure/rewiring.py
Comment thread leadforge/structure/graph.py
Comment thread tests/structure/test_motifs.py
Comment thread tests/structure/test_graph.py
Comment thread leadforge/structure/rewiring.py Outdated
Comment thread leadforge/structure/motifs.py
graph.py:
- Fix NodeSpec.metadata docstring: clarify that non-primitive values are
  JSON-encoded in GraphML export rather than "Serialised as JSON" (which
  implied it was already done)
- Add _make_graphml_safe() helper: JSON-encodes non-primitive attribute
  values (dict/list) into a '<key>_json' string attribute so that
  networkx.generate_graphml() never raises TypeError
- Rewrite to_graphml() to build an exportable copy of the graph through
  _make_graphml_safe() before calling generate_graphml

rewiring.py:
- Replace copy.copy() with explicit NodeSpec/EdgeSpec construction using
  deepcopy(metadata); shallow copy aliased the canonical motif's metadata
  dicts so post-rewiring mutations would have corrupted the frozen motif

Six E501 complaints from Copilot are false-positives: CI Lint & format
already passes (SUCCESS) on this commit; those threads resolved as stale.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions

This comment has been minimized.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements the Milestone 4 “world structure” layer for leadforge: a validated hidden-world DAG representation plus motif templates and a deterministic sampler that rewires motifs to create seed-driven structural variability.

Changes:

  • Added WorldGraph (NetworkX-backed) with structural validation and JSON/GraphML exports.
  • Defined node-type taxonomy + 5 canonical motif families, plus a stochastic rewire() step and sample_hidden_graph() entry point.
  • Added comprehensive test suite for node types, motifs, rewiring invariants, and sampler properties; updated dependencies for NetworkX/NumPy.

Reviewed changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
leadforge/structure/node_types.py Adds NodeType enum and constraint sets for validation.
leadforge/structure/graph.py Introduces WorldGraph, validation rules, and export utilities.
leadforge/structure/motifs.py Defines MotifFamily and the 5 v1 canonical motif templates + registry.
leadforge/structure/rewiring.py Adds stochastic rewiring rules (optional drops, jitter, confounder injection).
leadforge/structure/sampler.py Adds sample_hidden_graph() with motif selection + validation retries.
tests/structure/test_node_types.py Tests node-type enum/constraint invariants.
tests/structure/test_graph.py Tests graph validation and export round-trips/smoke checks.
tests/structure/test_motifs.py Tests motif registry and per-motif structural invariants.
tests/structure/test_rewiring.py Tests rewiring determinism, invariants, and variability across seeds.
tests/structure/test_sampler.py Tests sampler contract, determinism, pinned families, and exports.
tests/structure/__init__.py Adds test package marker for structure tests.
pyproject.toml Adds networkx/numpy deps and a mypy override for NetworkX.
.agent-plan.md Updates project plan/status to reflect Milestone 4 completion and Milestone 5 next steps.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread leadforge/structure/sampler.py
Comment thread leadforge/structure/graph.py Outdated
Comment thread leadforge/structure/graph.py
Comment thread leadforge/structure/rewiring.py Outdated
@github-actions

This comment has been minimized.

graph.py:
- Freeze NodeSpec and EdgeSpec dataclasses so field reassignment on
  canonical motif specs raises FrozenInstanceError; metadata dicts are
  still mutable by Python's type system but rewiring already deepcopies
  them, so motif integrity is preserved end-to-end
- Fix _make_graphml_safe docstring: clarify that ALL non-primitive values
  (None, tuples, enums, dicts, lists, …) are JSON-encoded, not just
  dict/list as the previous text implied

COPILOT-1 (RNGRoot integration in sampler): deferred to issue #9 — the
right substream name and API shape depend on how Generator calls the
sampler, which is defined in Milestone 5.

COPILOT-4 (E501 on EdgeSpec comprehension): resolved as false-positive —
CI Lint & format passes SUCCESS on this commit.

FAIL-1 (startup_failure on pr-agent-context-refresh): expected
bot-triggered approval-gate behaviour; no code change required.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions

This comment has been minimized.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements Milestone 4’s “world structure layer” for leadforge: a validated hidden-world DAG representation plus motif templates, stochastic rewiring, and a deterministic sampler used by the simulation layer.

Changes:

  • Added WorldGraph (NetworkX-backed) with structural validation and JSON/GraphML export.
  • Added v1 motif family registry + deterministic rewiring rules and a sample_hidden_graph() entry point.
  • Added comprehensive test suite for node types, graph validation/export, motifs, rewiring, and sampler behavior; updated dependencies to include networkx and numpy.

Reviewed changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
leadforge/structure/node_types.py Defines NodeType and constraint sets used by graph validation and motifs.
leadforge/structure/graph.py Introduces WorldGraph, validation logic, and export functions.
leadforge/structure/motifs.py Adds five canonical motif families and a name→family registry.
leadforge/structure/rewiring.py Implements stochastic rewiring (optional drops, weight jitter, confounder injection).
leadforge/structure/sampler.py Provides deterministic sampling with retries and optional motif pinning.
tests/structure/test_node_types.py Verifies node-type enum and constraint-set invariants.
tests/structure/test_graph.py Tests graph construction/validation failures and export contracts.
tests/structure/test_motifs.py Tests motif registry and canonical skeleton invariants.
tests/structure/test_rewiring.py Tests rewiring determinism, bounds, and validity properties.
tests/structure/test_sampler.py Tests sampler determinism, pinning, validation properties, and export smoke tests.
pyproject.toml Adds networkx/numpy deps and mypy override for NetworkX.
.agent-plan.md Updates milestone tracking documentation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread leadforge/structure/graph.py
Comment thread leadforge/structure/graph.py
Comment thread leadforge/structure/graph.py
Comment thread leadforge/structure/sampler.py
- NodeSpec/EdgeSpec: wrap metadata in MappingProxyType via __post_init__
  so canonical motif skeletons are truly immutable (COPILOT-2a/2b)
- _make_graphml_safe: add str() fallback for non-JSON-serialisable values
  and use collision-safe suffix-key generation (COPILOT-3)
- sample_hidden_graph: derive NumPy seed from RNGRoot(seed).child("hidden_graph")
  to align with repo RNG convention (COPILOT-1)
- rewiring: replace deepcopy(metadata) with dict() since MappingProxyType
  is already immutable and deepcopy cannot pickle it

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions

This comment has been minimized.

@github-actions

Copy link
Copy Markdown

pr-agent-context report:

No unresolved review comments, failing checks, or actionable patch coverage gaps were found on PR
#8. Treat this PR as all clear unless new signals appear.

Run metadata:

Tool ref: v4
Tool version: 4.0.20
Trigger: commit pushed
Workflow run: 24822284839 attempt 1
Comment timestamp: 2026-04-23T07:17:56.070512+00:00
PR head commit: b5623d2779387987a430cf1a5c8471cdf9365878

@shaypal5 shaypal5 requested a review from Copilot April 23, 2026 07:57

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@shaypal5 shaypal5 merged commit 6fd7137 into main Apr 25, 2026
9 checks passed
@shaypal5 shaypal5 deleted the feat/milestone-4-world-structure branch April 25, 2026 06:23
shaypal5 added a commit that referenced this pull request May 6, 2026
Fold the brutal self-review's findings back into the PR before review.

Bugs:
- (#1) run_packager validate→write order — both packagers wrote
  README/metadata on validation failure, leaving corrupt artifacts on
  disk that would silently get committed.  Gated on `errors == ()`;
  added no-write tests for both packagers.
- (#2) Instructor README inlined the public 3-tier README into a
  1-tier dataset card.  Replaced with a dedicated `INSTRUCTOR_BODY`
  constant that links to the public dataset and describes only the
  instructor-specific additions (full-horizon tables, hidden DAG,
  latent registry, mechanism summary).
- (#3) validate_upload_dir_safe also blocks strict descendants of
  release_dir; `--huggingface-dir release/intro` would otherwise
  rmtree the intro bundle.

Architecture:
- (#5) Finished shared-primitives extraction: SOURCE_TREE_BLOCK,
  validate_readme_substitution, replace_file, replace_dir,
  load_manifest now live in scripts/_release_common.py.  Both
  packagers reduced to imports.
- (#6) Replaced 60-line hand-rolled YAML renderer with yaml.safe_dump
  + a 4-line _IndentedDumper subclass.
- (#7) Removed dead --owner / --dataset-slug CLI flags.
- (#8) assemble_upload_dir now takes rendered_readme and writes it.
- (#9) build_config_for_tier made pure (no I/O); cheap manifest-stat
  preflight via _assert_tier_dir_exists.
- (#10) --default-config with --variant=instructor errors loudly.

CI:
- (#4) Added [publish] extra (datasets>=2.14, kaggle>=1.6) so the
  gated G12.3 / G12.4 / G11.3 tests install in one line.

Cleanups: visual cruft (#13#16), test cruft (#17 — unused tmp_path,
dead tag_lines), em-dash YAML round-trip parametrised for the
instructor pretty_name.

Verification: 1223 tests pass + 5 gated skips; ruff + mypy clean;
hash determinism PASS 67/67; leakage probes 0/3 reconstruct on every
tier; validate_release_candidate --no-rebuild exits 0.
release/{kaggle,huggingface,huggingface-instructor}/dataset-metadata
.json|README.md regenerated; audit-artifact-sync tests guard them.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
shaypal5 added a commit that referenced this pull request May 6, 2026
Fold Copilot's two real findings on the self-review revision back in.

COPILOT-1 — validate_upload_dir_safe was only invoked inside
assemble_upload_dir, which --dry-run skips.  A dry-run with
--huggingface-dir release (or .) would write the README into the
unsafe path BEFORE the safety net fired.  Hoist the check into
run_packager (both packagers) so it runs before any mkdir or write;
the inner assemble_upload_dir call stays as defence-in-depth for
direct callers.  New tests: dry-run with unsafe upload-dir raises
without writing; the same path through main() returns rc=2.

COPILOT-2 — Cover-image path resolution was inconsistent:
validate_cover_image used cover_image as passed, while
assemble_upload_dir did a separate ``release_dir / cover_image.name``
fallback.  Diverged for bare-basename inputs (false validation
failures) and two-paths-sharing-a-basename (assembler shadowing the
explicit path).  Added resolve_cover_image_path() to
_release_common.py (explicit-wins, release-dir fallback);
run_packager calls it once and threads the resolved path through
validation, the metadata's image field, and assembly.  New
tests/scripts/test_release_common.py covers the four resolution
branches; new packager-side tests confirm bare-basename success +
metadata field plumbing.

COPILOT-3 — outdated; already addressed by self-review fix #8 in
commit f2fc4a2.  Resolved as already treated; no code change.

Verification: 1232/1232 tests pass + 5 gated skips; ruff + mypy
clean; hash determinism PASS 67/67; leakage probes rc=0 on every
tier; validate_release_candidate --no-rebuild exits 0;
BUNDLE_SCHEMA_VERSION unchanged at 5.
release/{kaggle,huggingface,huggingface-instructor}/* artifacts
regenerated byte-identically.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
shaypal5 added a commit that referenced this pull request May 6, 2026
* PR 5.2: HuggingFace release packager + load_dataset smoke test

Add `scripts/package_hf_release.py` to generate `release/huggingface/README.md`
with G12.1-compliant YAML frontmatter (pretty_name, license, language,
task_categories, size_categories, tags, three configs with `default: true`
on intermediate per G12.2), inlining the rewritten `release/README.md`
body with HF-specific link rewrites.  `--variant=instructor` packages the
companion repo (G12.4) from `release/intermediate_instructor/` into a
separate `release/huggingface-instructor/` upload tree.  G12.3 covered
by a parametrised `load_dataset()` smoke test gated on the optional
`datasets` SDK.

Extract shared release-packaging primitives (link rewriter, dir-safety
guard, cover-image validator) into `scripts/_release_common.py`; refactor
the Kaggle packager to import them.  `release/kaggle/dataset-metadata.json`
is byte-stable across the refactor.

Delete the legacy `release/HF_DATASET_CARD.md` stub — superseded by the
generated card.  Gitignore `release/huggingface{,-instructor}/*` except
the committed README.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* PR 5.2 self-review fixes (Kaggle + HF packagers)

Fold the brutal self-review's findings back into the PR before review.

Bugs:
- (#1) run_packager validate→write order — both packagers wrote
  README/metadata on validation failure, leaving corrupt artifacts on
  disk that would silently get committed.  Gated on `errors == ()`;
  added no-write tests for both packagers.
- (#2) Instructor README inlined the public 3-tier README into a
  1-tier dataset card.  Replaced with a dedicated `INSTRUCTOR_BODY`
  constant that links to the public dataset and describes only the
  instructor-specific additions (full-horizon tables, hidden DAG,
  latent registry, mechanism summary).
- (#3) validate_upload_dir_safe also blocks strict descendants of
  release_dir; `--huggingface-dir release/intro` would otherwise
  rmtree the intro bundle.

Architecture:
- (#5) Finished shared-primitives extraction: SOURCE_TREE_BLOCK,
  validate_readme_substitution, replace_file, replace_dir,
  load_manifest now live in scripts/_release_common.py.  Both
  packagers reduced to imports.
- (#6) Replaced 60-line hand-rolled YAML renderer with yaml.safe_dump
  + a 4-line _IndentedDumper subclass.
- (#7) Removed dead --owner / --dataset-slug CLI flags.
- (#8) assemble_upload_dir now takes rendered_readme and writes it.
- (#9) build_config_for_tier made pure (no I/O); cheap manifest-stat
  preflight via _assert_tier_dir_exists.
- (#10) --default-config with --variant=instructor errors loudly.

CI:
- (#4) Added [publish] extra (datasets>=2.14, kaggle>=1.6) so the
  gated G12.3 / G12.4 / G11.3 tests install in one line.

Cleanups: visual cruft (#13#16), test cruft (#17 — unused tmp_path,
dead tag_lines), em-dash YAML round-trip parametrised for the
instructor pretty_name.

Verification: 1223 tests pass + 5 gated skips; ruff + mypy clean;
hash determinism PASS 67/67; leakage probes 0/3 reconstruct on every
tier; validate_release_candidate --no-rebuild exits 0.
release/{kaggle,huggingface,huggingface-instructor}/dataset-metadata
.json|README.md regenerated; audit-artifact-sync tests guard them.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* PR 5.2 Copilot-review fixes (Kaggle + HF packagers)

Fold Copilot's two real findings on the self-review revision back in.

COPILOT-1 — validate_upload_dir_safe was only invoked inside
assemble_upload_dir, which --dry-run skips.  A dry-run with
--huggingface-dir release (or .) would write the README into the
unsafe path BEFORE the safety net fired.  Hoist the check into
run_packager (both packagers) so it runs before any mkdir or write;
the inner assemble_upload_dir call stays as defence-in-depth for
direct callers.  New tests: dry-run with unsafe upload-dir raises
without writing; the same path through main() returns rc=2.

COPILOT-2 — Cover-image path resolution was inconsistent:
validate_cover_image used cover_image as passed, while
assemble_upload_dir did a separate ``release_dir / cover_image.name``
fallback.  Diverged for bare-basename inputs (false validation
failures) and two-paths-sharing-a-basename (assembler shadowing the
explicit path).  Added resolve_cover_image_path() to
_release_common.py (explicit-wins, release-dir fallback);
run_packager calls it once and threads the resolved path through
validation, the metadata's image field, and assembly.  New
tests/scripts/test_release_common.py covers the four resolution
branches; new packager-side tests confirm bare-basename success +
metadata field plumbing.

COPILOT-3 — outdated; already addressed by self-review fix #8 in
commit f2fc4a2.  Resolved as already treated; no code change.

Verification: 1232/1232 tests pass + 5 gated skips; ruff + mypy
clean; hash determinism PASS 67/67; leakage probes rc=0 on every
tier; validate_release_candidate --no-rebuild exits 0;
BUNDLE_SCHEMA_VERSION unchanged at 5.
release/{kaggle,huggingface,huggingface-instructor}/* artifacts
regenerated byte-identically.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

layer: structure structure/ motifs, graph, rewiring type: feature New capability

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants