feat: Milestone 5 — population generation and latent state initialisation#10
Merged
Merged
Conversation
…tion Implements build_population() in leadforge/simulation/population.py: - AccountRow generation: industry, region, employee/revenue/maturity bands, account created_at spread 30-730 days before world base date - ContactRow generation: persona-driven title/role/buyer_role, conditional account FK, contact created_at anchored to parent account - LeadRow generation: GTM-weighted lead_source, rep assignment from internal pool, lead_created_at within 30-day base window; initial stage = mql - LatentState: 8 hidden traits across 3 entity types, all in [0,1], sampled from clipped Gaussians with motif-family-aware mean biases - FK invariant: lead.account_id always equals contact.account_id - All randomness via RNGRoot named substreams — fully deterministic 26 tests: counts, determinism, FK integrity, latent range/completeness, motif bias properties (fit_dominant vs buying_committee_friction), and observable field validity. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Pull request overview
Introduces a new simulation-layer entry point to generate an initial “world population” (accounts/contacts/leads) alongside per-entity latent traits, with deterministic randomness derived from RNGRoot substreams and motif-family-dependent latent mean shifts.
Changes:
- Added
leadforge/simulation/population.pyimplementingbuild_population()plus account/contact/lead generation andLatentStateinitialization. - Added
tests/simulation/test_population.pywith coverage for counts, determinism, FK integrity, observable field validity, latent trait completeness/ranges, and motif-bias properties. - Updated
.agent-plan.mdto mark Milestone 5 complete and advance the plan to Milestone 6.
Reviewed changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| leadforge/simulation/population.py | Adds deterministic population + latent-state generation with motif-family biasing. |
| tests/simulation/test_population.py | Adds comprehensive tests validating population structure, determinism, and trait constraints. |
| .agent-plan.md | Updates milestone tracking and next-task breakdown. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This comment has been minimized.
This comment has been minimized.
- Docstring: correct determinism contract to include narrative and world_graph.motif_family (COPILOT-1) - build_population: add _validate_narrative() up-front guard that raises InvalidConfigError for empty industries, geographies, personas, or channels (COPILOT-2) - _channel_weights: fall back to uniform distribution when all GTM shares sum to zero, preventing random.choices ValueError (COPILOT-3) - 5 new tests covering all three fixes (363 total passing) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
pr-agent-context report: No unresolved review comments, failing checks, or actionable patch coverage gaps were found on PR
#10. Treat this PR as all clear unless new signals appear.Run metadata: |
shaypal5
added a commit
that referenced
this pull request
May 6, 2026
Fold the brutal self-review's findings back into the PR before review. Bugs: - (#1) run_packager validate→write order — both packagers wrote README/metadata on validation failure, leaving corrupt artifacts on disk that would silently get committed. Gated on `errors == ()`; added no-write tests for both packagers. - (#2) Instructor README inlined the public 3-tier README into a 1-tier dataset card. Replaced with a dedicated `INSTRUCTOR_BODY` constant that links to the public dataset and describes only the instructor-specific additions (full-horizon tables, hidden DAG, latent registry, mechanism summary). - (#3) validate_upload_dir_safe also blocks strict descendants of release_dir; `--huggingface-dir release/intro` would otherwise rmtree the intro bundle. Architecture: - (#5) Finished shared-primitives extraction: SOURCE_TREE_BLOCK, validate_readme_substitution, replace_file, replace_dir, load_manifest now live in scripts/_release_common.py. Both packagers reduced to imports. - (#6) Replaced 60-line hand-rolled YAML renderer with yaml.safe_dump + a 4-line _IndentedDumper subclass. - (#7) Removed dead --owner / --dataset-slug CLI flags. - (#8) assemble_upload_dir now takes rendered_readme and writes it. - (#9) build_config_for_tier made pure (no I/O); cheap manifest-stat preflight via _assert_tier_dir_exists. - (#10) --default-config with --variant=instructor errors loudly. CI: - (#4) Added [publish] extra (datasets>=2.14, kaggle>=1.6) so the gated G12.3 / G12.4 / G11.3 tests install in one line. Cleanups: visual cruft (#13–#16), test cruft (#17 — unused tmp_path, dead tag_lines), em-dash YAML round-trip parametrised for the instructor pretty_name. Verification: 1223 tests pass + 5 gated skips; ruff + mypy clean; hash determinism PASS 67/67; leakage probes 0/3 reconstruct on every tier; validate_release_candidate --no-rebuild exits 0. release/{kaggle,huggingface,huggingface-instructor}/dataset-metadata .json|README.md regenerated; audit-artifact-sync tests guard them. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
4 tasks
shaypal5
added a commit
that referenced
this pull request
May 6, 2026
* PR 5.2: HuggingFace release packager + load_dataset smoke test
Add `scripts/package_hf_release.py` to generate `release/huggingface/README.md`
with G12.1-compliant YAML frontmatter (pretty_name, license, language,
task_categories, size_categories, tags, three configs with `default: true`
on intermediate per G12.2), inlining the rewritten `release/README.md`
body with HF-specific link rewrites. `--variant=instructor` packages the
companion repo (G12.4) from `release/intermediate_instructor/` into a
separate `release/huggingface-instructor/` upload tree. G12.3 covered
by a parametrised `load_dataset()` smoke test gated on the optional
`datasets` SDK.
Extract shared release-packaging primitives (link rewriter, dir-safety
guard, cover-image validator) into `scripts/_release_common.py`; refactor
the Kaggle packager to import them. `release/kaggle/dataset-metadata.json`
is byte-stable across the refactor.
Delete the legacy `release/HF_DATASET_CARD.md` stub — superseded by the
generated card. Gitignore `release/huggingface{,-instructor}/*` except
the committed README.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* PR 5.2 self-review fixes (Kaggle + HF packagers)
Fold the brutal self-review's findings back into the PR before review.
Bugs:
- (#1) run_packager validate→write order — both packagers wrote
README/metadata on validation failure, leaving corrupt artifacts on
disk that would silently get committed. Gated on `errors == ()`;
added no-write tests for both packagers.
- (#2) Instructor README inlined the public 3-tier README into a
1-tier dataset card. Replaced with a dedicated `INSTRUCTOR_BODY`
constant that links to the public dataset and describes only the
instructor-specific additions (full-horizon tables, hidden DAG,
latent registry, mechanism summary).
- (#3) validate_upload_dir_safe also blocks strict descendants of
release_dir; `--huggingface-dir release/intro` would otherwise
rmtree the intro bundle.
Architecture:
- (#5) Finished shared-primitives extraction: SOURCE_TREE_BLOCK,
validate_readme_substitution, replace_file, replace_dir,
load_manifest now live in scripts/_release_common.py. Both
packagers reduced to imports.
- (#6) Replaced 60-line hand-rolled YAML renderer with yaml.safe_dump
+ a 4-line _IndentedDumper subclass.
- (#7) Removed dead --owner / --dataset-slug CLI flags.
- (#8) assemble_upload_dir now takes rendered_readme and writes it.
- (#9) build_config_for_tier made pure (no I/O); cheap manifest-stat
preflight via _assert_tier_dir_exists.
- (#10) --default-config with --variant=instructor errors loudly.
CI:
- (#4) Added [publish] extra (datasets>=2.14, kaggle>=1.6) so the
gated G12.3 / G12.4 / G11.3 tests install in one line.
Cleanups: visual cruft (#13–#16), test cruft (#17 — unused tmp_path,
dead tag_lines), em-dash YAML round-trip parametrised for the
instructor pretty_name.
Verification: 1223 tests pass + 5 gated skips; ruff + mypy clean;
hash determinism PASS 67/67; leakage probes 0/3 reconstruct on every
tier; validate_release_candidate --no-rebuild exits 0.
release/{kaggle,huggingface,huggingface-instructor}/dataset-metadata
.json|README.md regenerated; audit-artifact-sync tests guard them.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* PR 5.2 Copilot-review fixes (Kaggle + HF packagers)
Fold Copilot's two real findings on the self-review revision back in.
COPILOT-1 — validate_upload_dir_safe was only invoked inside
assemble_upload_dir, which --dry-run skips. A dry-run with
--huggingface-dir release (or .) would write the README into the
unsafe path BEFORE the safety net fired. Hoist the check into
run_packager (both packagers) so it runs before any mkdir or write;
the inner assemble_upload_dir call stays as defence-in-depth for
direct callers. New tests: dry-run with unsafe upload-dir raises
without writing; the same path through main() returns rc=2.
COPILOT-2 — Cover-image path resolution was inconsistent:
validate_cover_image used cover_image as passed, while
assemble_upload_dir did a separate ``release_dir / cover_image.name``
fallback. Diverged for bare-basename inputs (false validation
failures) and two-paths-sharing-a-basename (assembler shadowing the
explicit path). Added resolve_cover_image_path() to
_release_common.py (explicit-wins, release-dir fallback);
run_packager calls it once and threads the resolved path through
validation, the metadata's image field, and assembly. New
tests/scripts/test_release_common.py covers the four resolution
branches; new packager-side tests confirm bare-basename success +
metadata field plumbing.
COPILOT-3 — outdated; already addressed by self-review fix #8 in
commit f2fc4a2. Resolved as already treated; no code change.
Verification: 1232/1232 tests pass + 5 gated skips; ruff + mypy
clean; hash determinism PASS 67/67; leakage probes rc=0 on every
tier; validate_release_candidate --no-rebuild exits 0;
BUNDLE_SCHEMA_VERSION unchanged at 5.
release/{kaggle,huggingface,huggingface-instructor}/* artifacts
regenerated byte-identically.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
leadforge/simulation/population.pywithbuild_population()as the single entry pointlead.account_id == contact.account_id)LatentStatewith 8 hidden traits across 3 entity types (account: fit, budget readiness, process maturity; contact: problem awareness, authority, responsiveness, engagement propensity; lead: sales friction)_MOTIF_LATENT_BIASshift latent means to create structurally coherent worlds (e.g.fit_dominantraiseslatent_account_fitmean)RNGRootnamed substreams — fully deterministic given(config.seed, world_graph.motif_family)Test plan
tests/simulation/test_population.pyn_accounts/n_contacts/n_leadsfit_dominant>buying_committee_frictionforlatent_account_fitacross 15 seeds🤖 Generated with Claude Code