feat: Milestone 5 — population generation and latent state initialisation by shaypal5 · Pull Request #10 · leadforge-dev/leadforge

shaypal5 · 2026-04-27T14:03:50Z

Summary

Adds leadforge/simulation/population.py with build_population() as the single entry point
Generates accounts, contacts, and leads with all observable fields, FK consistency guaranteed (lead.account_id == contact.account_id)
Initialises LatentState with 8 hidden traits across 3 entity types (account: fit, budget readiness, process maturity; contact: problem awareness, authority, responsiveness, engagement propensity; lead: sales friction)
Motif-family biases in _MOTIF_LATENT_BIAS shift latent means to create structurally coherent worlds (e.g. fit_dominant raises latent_account_fit mean)
All randomness via RNGRoot named substreams — fully deterministic given (config.seed, world_graph.motif_family)

Test plan

🤖 Generated with Claude Code

…tion Implements build_population() in leadforge/simulation/population.py: - AccountRow generation: industry, region, employee/revenue/maturity bands, account created_at spread 30-730 days before world base date - ContactRow generation: persona-driven title/role/buyer_role, conditional account FK, contact created_at anchored to parent account - LeadRow generation: GTM-weighted lead_source, rep assignment from internal pool, lead_created_at within 30-day base window; initial stage = mql - LatentState: 8 hidden traits across 3 entity types, all in [0,1], sampled from clipped Gaussians with motif-family-aware mean biases - FK invariant: lead.account_id always equals contact.account_id - All randomness via RNGRoot named substreams — fully deterministic 26 tests: counts, determinism, FK integrity, latent range/completeness, motif bias properties (fit_dominant vs buying_committee_friction), and observable field validity. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Introduces a new simulation-layer entry point to generate an initial “world population” (accounts/contacts/leads) alongside per-entity latent traits, with deterministic randomness derived from RNGRoot substreams and motif-family-dependent latent mean shifts.

Changes:

Added leadforge/simulation/population.py implementing build_population() plus account/contact/lead generation and LatentState initialization.
Added tests/simulation/test_population.py with coverage for counts, determinism, FK integrity, observable field validity, latent trait completeness/ranges, and motif-bias properties.
Updated .agent-plan.md to mark Milestone 5 complete and advance the plan to Milestone 6.

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 3 comments.

File	Description
leadforge/simulation/population.py	Adds deterministic population + latent-state generation with motif-family biasing.
tests/simulation/test_population.py	Adds comprehensive tests validating population structure, determinism, and trait constraints.
.agent-plan.md	Updates milestone tracking and next-task breakdown.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Docstring: correct determinism contract to include narrative and world_graph.motif_family (COPILOT-1) - build_population: add _validate_narrative() up-front guard that raises InvalidConfigError for empty industries, geographies, personas, or channels (COPILOT-2) - _channel_weights: fall back to uniform distribution when all GTM shares sum to zero, preventing random.choices ValueError (COPILOT-3) - 5 new tests covering all three fixes (363 total passing) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-04-28T03:21:18Z

pr-agent-context report:

No unresolved review comments, failing checks, or actionable patch coverage gaps were found on PR
#10. Treat this PR as all clear unless new signals appear.

Run metadata:

Tool ref: v4
Tool version: 4.0.20
Trigger: commit pushed
Workflow run: 25032051451 attempt 1
Comment timestamp: 2026-04-28T03:20:30.335940+00:00
PR head commit: 0559a83d15f375cdfa53f9d4abb3b38ed15e91e4

Fold the brutal self-review's findings back into the PR before review. Bugs: - (#1) run_packager validate→write order — both packagers wrote README/metadata on validation failure, leaving corrupt artifacts on disk that would silently get committed. Gated on `errors == ()`; added no-write tests for both packagers. - (#2) Instructor README inlined the public 3-tier README into a 1-tier dataset card. Replaced with a dedicated `INSTRUCTOR_BODY` constant that links to the public dataset and describes only the instructor-specific additions (full-horizon tables, hidden DAG, latent registry, mechanism summary). - (#3) validate_upload_dir_safe also blocks strict descendants of release_dir; `--huggingface-dir release/intro` would otherwise rmtree the intro bundle. Architecture: - (#5) Finished shared-primitives extraction: SOURCE_TREE_BLOCK, validate_readme_substitution, replace_file, replace_dir, load_manifest now live in scripts/_release_common.py. Both packagers reduced to imports. - (#6) Replaced 60-line hand-rolled YAML renderer with yaml.safe_dump + a 4-line _IndentedDumper subclass. - (#7) Removed dead --owner / --dataset-slug CLI flags. - (#8) assemble_upload_dir now takes rendered_readme and writes it. - (#9) build_config_for_tier made pure (no I/O); cheap manifest-stat preflight via _assert_tier_dir_exists. - (#10) --default-config with --variant=instructor errors loudly. CI: - (#4) Added [publish] extra (datasets>=2.14, kaggle>=1.6) so the gated G12.3 / G12.4 / G11.3 tests install in one line. Cleanups: visual cruft (#13–#16), test cruft (#17 — unused tmp_path, dead tag_lines), em-dash YAML round-trip parametrised for the instructor pretty_name. Verification: 1223 tests pass + 5 gated skips; ruff + mypy clean; hash determinism PASS 67/67; leakage probes 0/3 reconstruct on every tier; validate_release_candidate --no-rebuild exits 0. release/{kaggle,huggingface,huggingface-instructor}/dataset-metadata .json|README.md regenerated; audit-artifact-sync tests guard them. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* PR 5.2: HuggingFace release packager + load_dataset smoke test Add `scripts/package_hf_release.py` to generate `release/huggingface/README.md` with G12.1-compliant YAML frontmatter (pretty_name, license, language, task_categories, size_categories, tags, three configs with `default: true` on intermediate per G12.2), inlining the rewritten `release/README.md` body with HF-specific link rewrites. `--variant=instructor` packages the companion repo (G12.4) from `release/intermediate_instructor/` into a separate `release/huggingface-instructor/` upload tree. G12.3 covered by a parametrised `load_dataset()` smoke test gated on the optional `datasets` SDK. Extract shared release-packaging primitives (link rewriter, dir-safety guard, cover-image validator) into `scripts/_release_common.py`; refactor the Kaggle packager to import them. `release/kaggle/dataset-metadata.json` is byte-stable across the refactor. Delete the legacy `release/HF_DATASET_CARD.md` stub — superseded by the generated card. Gitignore `release/huggingface{,-instructor}/*` except the committed README. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * PR 5.2 self-review fixes (Kaggle + HF packagers) Fold the brutal self-review's findings back into the PR before review. Bugs: - (#1) run_packager validate→write order — both packagers wrote README/metadata on validation failure, leaving corrupt artifacts on disk that would silently get committed. Gated on `errors == ()`; added no-write tests for both packagers. - (#2) Instructor README inlined the public 3-tier README into a 1-tier dataset card. Replaced with a dedicated `INSTRUCTOR_BODY` constant that links to the public dataset and describes only the instructor-specific additions (full-horizon tables, hidden DAG, latent registry, mechanism summary). - (#3) validate_upload_dir_safe also blocks strict descendants of release_dir; `--huggingface-dir release/intro` would otherwise rmtree the intro bundle. Architecture: - (#5) Finished shared-primitives extraction: SOURCE_TREE_BLOCK, validate_readme_substitution, replace_file, replace_dir, load_manifest now live in scripts/_release_common.py. Both packagers reduced to imports. - (#6) Replaced 60-line hand-rolled YAML renderer with yaml.safe_dump + a 4-line _IndentedDumper subclass. - (#7) Removed dead --owner / --dataset-slug CLI flags. - (#8) assemble_upload_dir now takes rendered_readme and writes it. - (#9) build_config_for_tier made pure (no I/O); cheap manifest-stat preflight via _assert_tier_dir_exists. - (#10) --default-config with --variant=instructor errors loudly. CI: - (#4) Added [publish] extra (datasets>=2.14, kaggle>=1.6) so the gated G12.3 / G12.4 / G11.3 tests install in one line. Cleanups: visual cruft (#13–#16), test cruft (#17 — unused tmp_path, dead tag_lines), em-dash YAML round-trip parametrised for the instructor pretty_name. Verification: 1223 tests pass + 5 gated skips; ruff + mypy clean; hash determinism PASS 67/67; leakage probes 0/3 reconstruct on every tier; validate_release_candidate --no-rebuild exits 0. release/{kaggle,huggingface,huggingface-instructor}/dataset-metadata .json|README.md regenerated; audit-artifact-sync tests guard them. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * PR 5.2 Copilot-review fixes (Kaggle + HF packagers) Fold Copilot's two real findings on the self-review revision back in. COPILOT-1 — validate_upload_dir_safe was only invoked inside assemble_upload_dir, which --dry-run skips. A dry-run with --huggingface-dir release (or .) would write the README into the unsafe path BEFORE the safety net fired. Hoist the check into run_packager (both packagers) so it runs before any mkdir or write; the inner assemble_upload_dir call stays as defence-in-depth for direct callers. New tests: dry-run with unsafe upload-dir raises without writing; the same path through main() returns rc=2. COPILOT-2 — Cover-image path resolution was inconsistent: validate_cover_image used cover_image as passed, while assemble_upload_dir did a separate ``release_dir / cover_image.name`` fallback. Diverged for bare-basename inputs (false validation failures) and two-paths-sharing-a-basename (assembler shadowing the explicit path). Added resolve_cover_image_path() to _release_common.py (explicit-wins, release-dir fallback); run_packager calls it once and threads the resolved path through validation, the metadata's image field, and assembly. New tests/scripts/test_release_common.py covers the four resolution branches; new packager-side tests confirm bare-basename success + metadata field plumbing. COPILOT-3 — outdated; already addressed by self-review fix #8 in commit f2fc4a2. Resolved as already treated; no code change. Verification: 1232/1232 tests pass + 5 gated skips; ruff + mypy clean; hash determinism PASS 67/67; leakage probes rc=0 on every tier; validate_release_candidate --no-rebuild exits 0; BUNDLE_SCHEMA_VERSION unchanged at 5. release/{kaggle,huggingface,huggingface-instructor}/* artifacts regenerated byte-identically. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings April 27, 2026 14:03

shaypal5 added this to the v0.3.0 — Motif variability + exposure modes milestone Apr 27, 2026

shaypal5 added type: feature New capability layer: simulation simulation/ discrete-time engine labels Apr 27, 2026

Copilot started reviewing on behalf of shaypal5 April 27, 2026 14:04 View session

This comment has been minimized.

Sign in to view

Copilot AI reviewed Apr 27, 2026

View reviewed changes

Comment thread leadforge/simulation/population.py Outdated

Comment thread leadforge/simulation/population.py

Comment thread leadforge/simulation/population.py

This comment has been minimized.

Sign in to view

shaypal5 merged commit 2bc3566 into main Apr 28, 2026
5 checks passed

shaypal5 deleted the feat/milestone-5-population-generation branch April 28, 2026 03:26

shaypal5 mentioned this pull request May 6, 2026

PR 5.2: HuggingFace release packager + load_dataset smoke test #72

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Milestone 5 — population generation and latent state initialisation#10

feat: Milestone 5 — population generation and latent state initialisation#10
shaypal5 merged 2 commits into
mainfrom
feat/milestone-5-population-generation

shaypal5 commented Apr 27, 2026

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

github-actions Bot commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shaypal5 commented Apr 27, 2026

Summary

Test plan

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

github-actions Bot commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants