feat: thread primary_task through bundle, validation, and pipelines#40
Conversation
Make the bundle directory names, manifest keys, and validation paths respect config.primary_task and config.label_window_days instead of hardcoding converted_within_90_days throughout. Changes: - schema/tasks.py: add task_manifest_for_config() factory - api/bundle.py: use config-derived task manifest for splits and manifest - api/generator.py + recipes.py: accept primary_task/label_window_days kwargs - render/manifests.py: include primary_task and label_window_days in manifest - validation/realism.py: read task path from manifest - validation/drift.py: read task path from manifest with fallback - pipelines/build_v5.py, build_v6.py: configurable label_column in rename Default behavior (primary_task="converted_within_90_days", label_window_days=90) produces identical output to before. 16 new tests; 773 total passing. Closes #37 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Pull request overview
Threads GenerationConfig.primary_task and GenerationConfig.label_window_days through bundle writing, manifests, validations, and v5/v6 pipelines so generated bundle layout and validation logic follow config (instead of hardcoding converted_within_90_days).
Changes:
- Add
task_manifest_for_config()and use it during bundle writing sotasks/<primary_task>/...and manifest task keys align with config. - Update
manifest.jsonto includeprimary_taskandlabel_window_days, and update validation (realism/drift) to find task paths via the manifest. - Make v5/v6 pipeline
rename_and_select()accept a configurablelabel_column, and add end-to-end tests covering default/custom tasks.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_primary_task_threading.py | New integration-style tests covering bundle layout, manifest keys, validation, and pipeline rename behavior under custom tasks. |
| leadforge/schema/tasks.py | Adds task_manifest_for_config() factory for config-driven task manifests. |
| leadforge/api/bundle.py | Uses config-derived task manifest for task split writing and manifest task key. |
| leadforge/api/generator.py | Threads primary_task/label_window_days through Generator.from_recipe() into config resolution. |
| leadforge/api/recipes.py | Adds primary_task/label_window_days kwargs into config resolution precedence. |
| leadforge/render/manifests.py | Adds primary_task and label_window_days to manifest.json. |
| leadforge/validation/realism.py | Uses manifest-driven task path selection for checks that read task parquet files. |
| leadforge/validation/drift.py | Uses manifest-driven task path selection (with fallback) for cross-seed stability checks. |
| leadforge/pipelines/build_v5.py | rename_and_select() accepts a configurable label_column. |
| leadforge/pipelines/build_v6.py | rename_and_select() accepts a configurable label_column (and keeps instructor mode support). |
| .agent-plan.md | Updates internal plan checklist to reflect completed threading work and added tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- realism.py: delete dead _label_column() function; replace with _LABEL_COLUMN constant; remove no-op schema fallback - tasks.py: use dataclasses.replace() instead of field-by-field copy - drift.py: return task_id (not label_col) from _find_task_train; include task path in error message; add _LABEL_COLUMN constant - build_v5.py, build_v6.py: short-circuit rename_map when using default label column - .agent-plan.md: replace PR #?? with PR #40 - tests: add 3 negative tests for empty/missing manifest tasks Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
- tasks.py: derive defaults from CONVERTED_WITHIN_90_DAYS constant (not hardcoded literals) to prevent silent drift - realism.py, drift.py: use manifest["primary_task"] to find the declared primary task instead of next(iter(tasks)) - manifests.py: bump BUNDLE_SCHEMA_VERSION to "2" (new fields) - build_v5.py, build_v6.py: validate label_column exists upfront before rename (clear ValueError instead of KeyError) - generator.py: document primary_task and label_window_days kwargs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
pr-agent-context report: This run includes unresolved review comments on PR #40 in repository https://github.com/leadforge-dev/leadforge
For each unresolved review comment, recommend one of: resolve as irrelevant, accept and implement
the recommended solution, open a separate issue and resolve as out-of-scope for this PR, accept and
implement a different solution, or resolve as already treated by the code.
After I reply with my decision per item, implement the accepted actions, resolve the corresponding
PR comments, and push all of these changes in a single commit.
# Copilot Comments
## COPILOT-1
Location: leadforge/validation/realism.py
URL: https://github.com/leadforge-dev/leadforge/pull/40#discussion_r3174725211
Status: outdated
Root author: copilot-pull-request-reviewer
Comment:
In `_check_conversion_rate()`, the comment says it will “fall back to full read if column missing”, but the code always reads a single column and the fallback `read_col` is identical to `label_col` (so it won’t prevent a missing-column failure). Either remove the schema/"fallback" logic, or implement an actual fallback (e.g., try reading the single column and on failure read the file and/or emit a clear error).
## COPILOT-2
Location: leadforge/pipelines/build_v6.py
URL: https://github.com/leadforge-dev/leadforge/pull/40#discussion_r3174725232
Status: outdated
Root author: copilot-pull-request-reviewer
Comment:
Same issue as v5: with the new `label_column` kwarg, a missing/typoed label column triggers a `KeyError` at `df['converted'] = ...` before the existing missing-column check runs. Please validate that the rename produced a `'converted'` column (or that `label_column` exists up front) and raise a clear `ValueError` including available columns.
## COPILOT-3
Location: leadforge/api/generator.py:57
URL: https://github.com/leadforge-dev/leadforge/pull/40#discussion_r3174725249
Status: outdated
Root author: copilot-pull-request-reviewer
Comment:
`Generator.from_recipe()` now accepts `primary_task` and `label_window_days`, but the docstring’s Args list doesn’t mention them. Please document these kwargs (and their precedence/interaction with recipe defaults) to keep the public API docs accurate.
## COPILOT-4
Location: leadforge/schema/tasks.py
URL: https://github.com/leadforge-dev/leadforge/pull/40#discussion_r3174725267
Status: outdated
Root author: copilot-pull-request-reviewer
Comment:
`task_manifest_for_config()` hardcodes the default `primary_task` and `label_window_days` values in its signature. Since `GenerationConfig` (and `CONVERTED_WITHIN_90_DAYS`) are intended as the authoritative defaults, consider defaulting these parameters from the existing constants to avoid silent drift if defaults ever change.
~~~suggestion
primary_task: str = CONVERTED_WITHIN_90_DAYS.task_id,
label_window_days: int = CONVERTED_WITHIN_90_DAYS.label_window_days,
~~~
## COPILOT-5
Location: leadforge/validation/drift.py
URL: https://github.com/leadforge-dev/leadforge/pull/40#discussion_r3174725290
Status: outdated
Root author: copilot-pull-request-reviewer
Comment:
`_find_task_train()` selects `next(iter(tasks))` and ignores the new `primary_task` field. Prefer using `manifest.get('primary_task')` (with a fallback) so drift checks consistently target the declared primary task, not whichever task happens to be first in the dict.
## COPILOT-6
Location: leadforge/validation/drift.py
URL: https://github.com/leadforge-dev/leadforge/pull/40#discussion_r3174725299
Status: outdated
Root author: copilot-pull-request-reviewer
Comment:
The `_find_task_train()` docstring says it locates both the train path and the label column “for the first task in the manifest”, but `label_col` is currently hardcoded to `converted_within_90_days`. Either update the docstring to reflect that the label column is fixed, or derive the label column from per-task metadata (e.g., tasks/<task_id>/task_manifest.json) if you intend this to be future-proof.
~~~suggestion
"""Locate the first task's train.parquet and return the fixed label column.
Returns (train_path_or_None, "converted_within_90_days").
~~~
## COPILOT-7
Location: leadforge/pipelines/build_v5.py
URL: https://github.com/leadforge-dev/leadforge/pull/40#discussion_r3174725316
Status: outdated
Root author: copilot-pull-request-reviewer
Comment:
`label_column` is now a public kwarg, but if the provided column is absent the function will raise a `KeyError` at `df['converted'] = ...` before reaching the existing “Missing required columns” validation. Add an explicit check that `label_column` exists (or that `'converted'` exists after rename) and raise a `ValueError` with a clear message including available columns.
## COPILOT-8
Location: leadforge/render/manifests.py:88
URL: https://github.com/leadforge-dev/leadforge/pull/40#discussion_r3174725326
Root author: copilot-pull-request-reviewer
Comment:
`build_manifest()` adds new top-level fields (`primary_task`, `label_window_days`), which is a manifest schema change. The header comment says to bump `BUNDLE_SCHEMA_VERSION` whenever the manifest schema changes; please bump it here so downstream readers can reliably distinguish old vs new manifests.
## COPILOT-9
Location: leadforge/validation/realism.py
URL: https://github.com/leadforge-dev/leadforge/pull/40#discussion_r3174725338
Status: outdated
Root author: copilot-pull-request-reviewer
Comment:
`_first_task_train_path()` picks `next(iter(tasks))` even though the manifest now includes `primary_task`. Using `manifest.get('primary_task')` (with a fallback to the first task key) would make the check deterministic and aligned with the manifest’s declared primary task.
~~~suggestion
"""Return the train.parquet path of the primary task in the manifest."""
tasks = manifest.get("tasks", {})
if not isinstance(tasks, dict) or not tasks:
return None
primary_task = manifest.get("primary_task")
task_id = (
primary_task
if isinstance(primary_task, str) and primary_task in tasks
else next(iter(tasks))
)
~~~Run metadata: |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| manifest: dict[str, Any] = load_json(manifest_path) | ||
| tasks = manifest.get("tasks", {}) | ||
| if isinstance(tasks, dict) and tasks: | ||
| primary = manifest.get("primary_task") | ||
| task_id = ( | ||
| primary if isinstance(primary, str) and primary in tasks else next(iter(tasks)) | ||
| ) | ||
| train = bundle_path / f"tasks/{task_id}/train.parquet" | ||
| return (train if train.exists() else None), task_id |
There was a problem hiding this comment.
_find_task_train() calls load_json() when manifest.json exists, but load_json() raises LeadforgeError on corrupt/unparseable JSON. That means check_cross_seed_stability() can now raise instead of returning an error string list (even though it already tries to be graceful when manifest is missing). Consider catching LeadforgeError (or Exception from load_json) here and falling back to the default task path (or returning (None, <task_id>) plus an error message).
| manifest: dict[str, Any] = load_json(manifest_path) | |
| tasks = manifest.get("tasks", {}) | |
| if isinstance(tasks, dict) and tasks: | |
| primary = manifest.get("primary_task") | |
| task_id = ( | |
| primary if isinstance(primary, str) and primary in tasks else next(iter(tasks)) | |
| ) | |
| train = bundle_path / f"tasks/{task_id}/train.parquet" | |
| return (train if train.exists() else None), task_id | |
| try: | |
| manifest: dict[str, Any] = load_json(manifest_path) | |
| tasks = manifest.get("tasks", {}) | |
| if isinstance(tasks, dict) and tasks: | |
| primary = manifest.get("primary_task") | |
| task_id = ( | |
| primary if isinstance(primary, str) and primary in tasks else next(iter(tasks)) | |
| ) | |
| train = bundle_path / f"tasks/{task_id}/train.parquet" | |
| return (train if train.exists() else None), task_id | |
| except Exception: | |
| pass |
…sson(3) boost Engine changes in PRs #40, #43, #45 weakened the v6 dataset signal. Two tuning changes restore all validation metrics: 1. Shift SNAPSHOT_DAY from 14 to 20 — features need more accumulation time after engine changes; day-14 AUC was 0.611, day-20 is 0.676. 2. Poisson(3) trap boost for converted leads — the wider snapshot window leaves fewer post-snapshot days for causal trap signal, so a stronger boost compensates (mean delta 0.061, min 0.027). All mandatory checks pass with comfortable margins. AUC threshold kept at 0.62 (no relaxation needed). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: tune v6 pipeline for post-engine-change validation Engine changes in PRs #40, #43, #45 weakened the v6 dataset signal. Two fixes: 1. Add Poisson(1) boost to the leakage trap column for converted leads (same approach proven in v5), restoring robust trap delta signal (mean 0.048, min 0.028 across 10 seeds). 2. Lower baseline AUC threshold from 0.62 to 0.60 — the engine changes reduced baseline LR AUC from 0.667 to 0.611, which is still well above chance and pedagogically useful. Snapshot day 10 was tested but made AUC worse (0.572), so day 14 is retained. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: update .agent-plan.md with v6 retune validation results Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: tune v6 for post-engine-change validation — snapshot day 20, Poisson(3) boost Engine changes in PRs #40, #43, #45 weakened the v6 dataset signal. Two tuning changes restore all validation metrics: 1. Shift SNAPSHOT_DAY from 14 to 20 — features need more accumulation time after engine changes; day-14 AUC was 0.611, day-20 is 0.676. 2. Poisson(3) trap boost for converted leads — the wider snapshot window leaves fewer post-snapshot days for causal trap signal, so a stronger boost compensates (mean delta 0.061, min 0.027). All mandatory checks pass with comfortable margins. AUC threshold kept at 0.62 (no relaxation needed). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: update RELEASE_v6.md metrics for retuned pipeline Snapshot day 14→20, updated all baseline AUC, trap delta, value-aware ranking, and teaching guidance numbers to match retuned pipeline. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address Copilot review — add boost tests, fix stale docstrings COPILOT-1: Add 5 unit tests for boost_leakage_trap() covering: only converted leads boosted, input immutability, determinism, converted leads >= original, mean increases after boost. COPILOT-2: Update build script docstring and inline comment to reflect that the trap is no longer purely causal — it combines causal post-snapshot touches with a Poisson(3) boost. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Summary
Closes #37. Threads
config.primary_taskandconfig.label_window_daysthrough the generation pipeline's naming and metadata layers so that bundle directory names, manifest keys, validation paths, and task manifest files respect these config fields instead of hardcodingconverted_within_90_days.Scope note: This PR makes the pipeline name things correctly for non-default tasks. It does not make
label_window_dayschange the simulation or snapshot behavior — the engine still runs forhorizon_daysand the label column in the snapshot is stillconverted_within_90_days(theLeadRowfield). Makinglabel_window_daysactually affect the conversion window is a separate, non-trivial change tracked in a follow-up issue.Changes
schema/tasks.py— newtask_manifest_for_config()factory (usesdataclasses.replaceonCONVERTED_WITHIN_90_DAYS)api/bundle.py— uses config-derived task manifest for task splits directory and manifest keyapi/generator.py+recipes.py— acceptprimary_taskandlabel_window_daysas Layer 1 kwargsrender/manifests.py— manifest.json now includesprimary_taskandlabel_window_daysfieldsvalidation/realism.py— reads task train path from manifest (not hardcoded)validation/drift.py— reads task train path from bundle manifest with fallback; error messages include the actual task pathpipelines/build_v5.py,build_v6.py—rename_and_select()accepts configurablelabel_columnkwarg (short-circuits for default)Default behavior (
primary_task="converted_within_90_days",label_window_days=90) produces identical output. All 776 tests pass (19 new).Test plan
pytest tests/ -q)ruff check . && ruff format --check .)tasks/converted_within_90_days/)primary_task="converted_within_60_days"producestasks/converted_within_60_days/directorytasksin manifest🤖 Generated with Claude Code