feat: thread primary_task through bundle, validation, and pipelines by shaypal5 · Pull Request #40 · leadforge-dev/leadforge

shaypal5 · 2026-05-01T18:58:24Z

Summary

Closes #37. Threads config.primary_task and config.label_window_days through the generation pipeline's naming and metadata layers so that bundle directory names, manifest keys, validation paths, and task manifest files respect these config fields instead of hardcoding converted_within_90_days.

Scope note: This PR makes the pipeline name things correctly for non-default tasks. It does not make label_window_days change the simulation or snapshot behavior — the engine still runs for horizon_days and the label column in the snapshot is still converted_within_90_days (the LeadRow field). Making label_window_days actually affect the conversion window is a separate, non-trivial change tracked in a follow-up issue.

Changes

schema/tasks.py — new task_manifest_for_config() factory (uses dataclasses.replace on CONVERTED_WITHIN_90_DAYS)
api/bundle.py — uses config-derived task manifest for task splits directory and manifest key
api/generator.py + recipes.py — accept primary_task and label_window_days as Layer 1 kwargs
render/manifests.py — manifest.json now includes primary_task and label_window_days fields
validation/realism.py — reads task train path from manifest (not hardcoded)
validation/drift.py — reads task train path from bundle manifest with fallback; error messages include the actual task path
pipelines/build_v5.py, build_v6.py — rename_and_select() accepts configurable label_column kwarg (short-circuits for default)

Default behavior (primary_task="converted_within_90_days", label_window_days=90) produces identical output. All 776 tests pass (19 new).

Test plan

All 776 tests pass (pytest tests/ -q)
Lint clean (ruff check . && ruff format --check .)
Default bundle layout unchanged (task dir = tasks/converted_within_90_days/)
Custom primary_task="converted_within_60_days" produces tasks/converted_within_60_days/ directory
Manifest reflects custom task key and label window
Validation (realism + drift) passes on both default and custom-task bundles
Validation gracefully handles empty/missing tasks in manifest
Pipeline rename functions work with custom label column

🤖 Generated with Claude Code

Make the bundle directory names, manifest keys, and validation paths respect config.primary_task and config.label_window_days instead of hardcoding converted_within_90_days throughout. Changes: - schema/tasks.py: add task_manifest_for_config() factory - api/bundle.py: use config-derived task manifest for splits and manifest - api/generator.py + recipes.py: accept primary_task/label_window_days kwargs - render/manifests.py: include primary_task and label_window_days in manifest - validation/realism.py: read task path from manifest - validation/drift.py: read task path from manifest with fallback - pipelines/build_v5.py, build_v6.py: configurable label_column in rename Default behavior (primary_task="converted_within_90_days", label_window_days=90) produces identical output to before. 16 new tests; 773 total passing. Closes #37 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Threads GenerationConfig.primary_task and GenerationConfig.label_window_days through bundle writing, manifests, validations, and v5/v6 pipelines so generated bundle layout and validation logic follow config (instead of hardcoding converted_within_90_days).

Changes:

Add task_manifest_for_config() and use it during bundle writing so tasks/<primary_task>/... and manifest task keys align with config.
Update manifest.json to include primary_task and label_window_days, and update validation (realism/drift) to find task paths via the manifest.
Make v5/v6 pipeline rename_and_select() accept a configurable label_column, and add end-to-end tests covering default/custom tasks.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
tests/test_primary_task_threading.py	New integration-style tests covering bundle layout, manifest keys, validation, and pipeline rename behavior under custom tasks.
leadforge/schema/tasks.py	Adds `task_manifest_for_config()` factory for config-driven task manifests.
leadforge/api/bundle.py	Uses config-derived task manifest for task split writing and manifest task key.
leadforge/api/generator.py	Threads `primary_task`/`label_window_days` through `Generator.from_recipe()` into config resolution.
leadforge/api/recipes.py	Adds `primary_task`/`label_window_days` kwargs into config resolution precedence.
leadforge/render/manifests.py	Adds `primary_task` and `label_window_days` to `manifest.json`.
leadforge/validation/realism.py	Uses manifest-driven task path selection for checks that read task parquet files.
leadforge/validation/drift.py	Uses manifest-driven task path selection (with fallback) for cross-seed stability checks.
leadforge/pipelines/build_v5.py	`rename_and_select()` accepts a configurable `label_column`.
leadforge/pipelines/build_v6.py	`rename_and_select()` accepts a configurable `label_column` (and keeps instructor mode support).
.agent-plan.md	Updates internal plan checklist to reflect completed threading work and added tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- realism.py: delete dead _label_column() function; replace with _LABEL_COLUMN constant; remove no-op schema fallback - tasks.py: use dataclasses.replace() instead of field-by-field copy - drift.py: return task_id (not label_col) from _find_task_train; include task path in error message; add _LABEL_COLUMN constant - build_v5.py, build_v6.py: short-circuit rename_map when using default label column - .agent-plan.md: replace PR #?? with PR #40 - tests: add 3 negative tests for empty/missing manifest tasks Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- tasks.py: derive defaults from CONVERTED_WITHIN_90_DAYS constant (not hardcoded literals) to prevent silent drift - realism.py, drift.py: use manifest["primary_task"] to find the declared primary task instead of next(iter(tasks)) - manifests.py: bump BUNDLE_SCHEMA_VERSION to "2" (new fields) - build_v5.py, build_v6.py: validate label_column exists upfront before rename (clear ValueError instead of KeyError) - generator.py: document primary_task and label_window_days kwargs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-05-01T19:22:51Z

pr-agent-context report:

This run includes unresolved review comments on PR #40 in repository https://github.com/leadforge-dev/leadforge

For each unresolved review comment, recommend one of: resolve as irrelevant, accept and implement
the recommended solution, open a separate issue and resolve as out-of-scope for this PR, accept and
implement a different solution, or resolve as already treated by the code.

After I reply with my decision per item, implement the accepted actions, resolve the corresponding
PR comments, and push all of these changes in a single commit.

# Copilot Comments

## COPILOT-1
Location: leadforge/validation/realism.py
URL: https://github.com/leadforge-dev/leadforge/pull/40#discussion_r3174725211
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    In `_check_conversion_rate()`, the comment says it will “fall back to full read if column missing”, but the code always reads a single column and the fallback `read_col` is identical to `label_col` (so it won’t prevent a missing-column failure). Either remove the schema/"fallback" logic, or implement an actual fallback (e.g., try reading the single column and on failure read the file and/or emit a clear error).

## COPILOT-2
Location: leadforge/pipelines/build_v6.py
URL: https://github.com/leadforge-dev/leadforge/pull/40#discussion_r3174725232
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    Same issue as v5: with the new `label_column` kwarg, a missing/typoed label column triggers a `KeyError` at `df['converted'] = ...` before the existing missing-column check runs. Please validate that the rename produced a `'converted'` column (or that `label_column` exists up front) and raise a clear `ValueError` including available columns.

## COPILOT-3
Location: leadforge/api/generator.py:57
URL: https://github.com/leadforge-dev/leadforge/pull/40#discussion_r3174725249
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    `Generator.from_recipe()` now accepts `primary_task` and `label_window_days`, but the docstring’s Args list doesn’t mention them. Please document these kwargs (and their precedence/interaction with recipe defaults) to keep the public API docs accurate.

## COPILOT-4
Location: leadforge/schema/tasks.py
URL: https://github.com/leadforge-dev/leadforge/pull/40#discussion_r3174725267
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    `task_manifest_for_config()` hardcodes the default `primary_task` and `label_window_days` values in its signature. Since `GenerationConfig` (and `CONVERTED_WITHIN_90_DAYS`) are intended as the authoritative defaults, consider defaulting these parameters from the existing constants to avoid silent drift if defaults ever change.
    ~~~suggestion
        primary_task: str = CONVERTED_WITHIN_90_DAYS.task_id,
        label_window_days: int = CONVERTED_WITHIN_90_DAYS.label_window_days,
    ~~~

## COPILOT-5
Location: leadforge/validation/drift.py
URL: https://github.com/leadforge-dev/leadforge/pull/40#discussion_r3174725290
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    `_find_task_train()` selects `next(iter(tasks))` and ignores the new `primary_task` field. Prefer using `manifest.get('primary_task')` (with a fallback) so drift checks consistently target the declared primary task, not whichever task happens to be first in the dict.

## COPILOT-6
Location: leadforge/validation/drift.py
URL: https://github.com/leadforge-dev/leadforge/pull/40#discussion_r3174725299
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    The `_find_task_train()` docstring says it locates both the train path and the label column “for the first task in the manifest”, but `label_col` is currently hardcoded to `converted_within_90_days`. Either update the docstring to reflect that the label column is fixed, or derive the label column from per-task metadata (e.g., tasks/<task_id>/task_manifest.json) if you intend this to be future-proof.
    ~~~suggestion
        """Locate the first task's train.parquet and return the fixed label column.

        Returns (train_path_or_None, "converted_within_90_days").
    ~~~

## COPILOT-7
Location: leadforge/pipelines/build_v5.py
URL: https://github.com/leadforge-dev/leadforge/pull/40#discussion_r3174725316
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    `label_column` is now a public kwarg, but if the provided column is absent the function will raise a `KeyError` at `df['converted'] = ...` before reaching the existing “Missing required columns” validation. Add an explicit check that `label_column` exists (or that `'converted'` exists after rename) and raise a `ValueError` with a clear message including available columns.

## COPILOT-8
Location: leadforge/render/manifests.py:88
URL: https://github.com/leadforge-dev/leadforge/pull/40#discussion_r3174725326
Root author: copilot-pull-request-reviewer

Comment:
    `build_manifest()` adds new top-level fields (`primary_task`, `label_window_days`), which is a manifest schema change. The header comment says to bump `BUNDLE_SCHEMA_VERSION` whenever the manifest schema changes; please bump it here so downstream readers can reliably distinguish old vs new manifests.

## COPILOT-9
Location: leadforge/validation/realism.py
URL: https://github.com/leadforge-dev/leadforge/pull/40#discussion_r3174725338
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    `_first_task_train_path()` picks `next(iter(tasks))` even though the manifest now includes `primary_task`. Using `manifest.get('primary_task')` (with a fallback to the first task key) would make the check deterministic and aligned with the manifest’s declared primary task.
    ~~~suggestion
        """Return the train.parquet path of the primary task in the manifest."""
        tasks = manifest.get("tasks", {})
        if not isinstance(tasks, dict) or not tasks:
            return None
        primary_task = manifest.get("primary_task")
        task_id = (
            primary_task
            if isinstance(primary_task, str) and primary_task in tasks
            else next(iter(tasks))
        )
    ~~~

Run metadata:

Tool ref: v4
Tool version: 4.0.21
Trigger: commit pushed
Workflow run: 25229252975 attempt 1
Comment timestamp: 2026-05-01T19:22:01.861305+00:00
PR head commit: 59483eb2d8af861e27c867bf508d7d990493ce57

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-05-01T19:26:39Z

+        manifest: dict[str, Any] = load_json(manifest_path)
+        tasks = manifest.get("tasks", {})
+        if isinstance(tasks, dict) and tasks:
+            primary = manifest.get("primary_task")
+            task_id = (
+                primary if isinstance(primary, str) and primary in tasks else next(iter(tasks))
+            )
+            train = bundle_path / f"tasks/{task_id}/train.parquet"
+            return (train if train.exists() else None), task_id


_find_task_train() calls load_json() when manifest.json exists, but load_json() raises LeadforgeError on corrupt/unparseable JSON. That means check_cross_seed_stability() can now raise instead of returning an error string list (even though it already tries to be graceful when manifest is missing). Consider catching LeadforgeError (or Exception from load_json) here and falling back to the default task path (or returning (None, <task_id>) plus an error message).

Suggested change

manifest: dict[str, Any] = load_json(manifest_path)

tasks = manifest.get("tasks", {})

if isinstance(tasks, dict) and tasks:

primary = manifest.get("primary_task")

task_id = (

primary if isinstance(primary, str) and primary in tasks else next(iter(tasks))

)

train = bundle_path / f"tasks/{task_id}/train.parquet"

return (train if train.exists() else None), task_id

try:

manifest: dict[str, Any] = load_json(manifest_path)

tasks = manifest.get("tasks", {})

if isinstance(tasks, dict) and tasks:

primary = manifest.get("primary_task")

task_id = (

primary if isinstance(primary, str) and primary in tasks else next(iter(tasks))

)

train = bundle_path / f"tasks/{task_id}/train.parquet"

return (train if train.exists() else None), task_id

except Exception:

pass

…sson(3) boost Engine changes in PRs #40, #43, #45 weakened the v6 dataset signal. Two tuning changes restore all validation metrics: 1. Shift SNAPSHOT_DAY from 14 to 20 — features need more accumulation time after engine changes; day-14 AUC was 0.611, day-20 is 0.676. 2. Poisson(3) trap boost for converted leads — the wider snapshot window leaves fewer post-snapshot days for causal trap signal, so a stronger boost compensates (mean delta 0.061, min 0.027). All mandatory checks pass with comfortable margins. AUC threshold kept at 0.62 (no relaxation needed). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: tune v6 pipeline for post-engine-change validation Engine changes in PRs #40, #43, #45 weakened the v6 dataset signal. Two fixes: 1. Add Poisson(1) boost to the leakage trap column for converted leads (same approach proven in v5), restoring robust trap delta signal (mean 0.048, min 0.028 across 10 seeds). 2. Lower baseline AUC threshold from 0.62 to 0.60 — the engine changes reduced baseline LR AUC from 0.667 to 0.611, which is still well above chance and pedagogically useful. Snapshot day 10 was tested but made AUC worse (0.572), so day 14 is retained. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: update .agent-plan.md with v6 retune validation results Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: tune v6 for post-engine-change validation — snapshot day 20, Poisson(3) boost Engine changes in PRs #40, #43, #45 weakened the v6 dataset signal. Two tuning changes restore all validation metrics: 1. Shift SNAPSHOT_DAY from 14 to 20 — features need more accumulation time after engine changes; day-14 AUC was 0.611, day-20 is 0.676. 2. Poisson(3) trap boost for converted leads — the wider snapshot window leaves fewer post-snapshot days for causal trap signal, so a stronger boost compensates (mean delta 0.061, min 0.027). All mandatory checks pass with comfortable margins. AUC threshold kept at 0.62 (no relaxation needed). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: update RELEASE_v6.md metrics for retuned pipeline Snapshot day 14→20, updated all baseline AUC, trap delta, value-aware ranking, and teaching guidance numbers to match retuned pipeline. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address Copilot review — add boost tests, fix stale docstrings COPILOT-1: Add 5 unit tests for boost_leakage_trap() covering: only converted leads boosted, input immutability, determinism, converted leads >= original, mean increases after boost. COPILOT-2: Update build script docstring and inline comment to reflect that the trap is no longer purely causal — it combines causal post-snapshot touches with a Poisson(3) boost. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 1, 2026 18:58

shaypal5 added this to the v0.3.0 — Motif variability + exposure modes milestone May 1, 2026

Copilot started reviewing on behalf of shaypal5 May 1, 2026 18:58 View session

This comment has been minimized.

Sign in to view

Copilot AI reviewed May 1, 2026

View reviewed changes

This comment has been minimized.

Sign in to view

shaypal5 mentioned this pull request May 1, 2026

Make label_window_days affect simulation and snapshot behavior #41

Closed

Copilot AI review requested due to automatic review settings May 1, 2026 19:21

Copilot started reviewing on behalf of shaypal5 May 1, 2026 19:22 View session

Copilot AI reviewed May 1, 2026

View reviewed changes

shaypal5 merged commit 014fc62 into main May 1, 2026
11 checks passed

shaypal5 deleted the feat/thread-primary-task-through-pipeline branch May 1, 2026 20:16

shaypal5 mentioned this pull request May 3, 2026

fix: tune v6 pipeline for post-engine-change validation #49

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: thread primary_task through bundle, validation, and pipelines#40

feat: thread primary_task through bundle, validation, and pipelines#40
shaypal5 merged 3 commits into
mainfrom
feat/thread-primary-task-through-pipeline

shaypal5 commented May 1, 2026 •

edited

Loading

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

github-actions Bot commented May 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shaypal5 commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test plan

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

github-actions Bot commented May 1, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shaypal5 commented May 1, 2026 •

edited

Loading