Skip to content

feat: thread primary_task through bundle, validation, and pipelines#40

Merged
shaypal5 merged 3 commits into
mainfrom
feat/thread-primary-task-through-pipeline
May 1, 2026
Merged

feat: thread primary_task through bundle, validation, and pipelines#40
shaypal5 merged 3 commits into
mainfrom
feat/thread-primary-task-through-pipeline

Conversation

@shaypal5

@shaypal5 shaypal5 commented May 1, 2026

Copy link
Copy Markdown
Contributor

Summary

Closes #37. Threads config.primary_task and config.label_window_days through the generation pipeline's naming and metadata layers so that bundle directory names, manifest keys, validation paths, and task manifest files respect these config fields instead of hardcoding converted_within_90_days.

Scope note: This PR makes the pipeline name things correctly for non-default tasks. It does not make label_window_days change the simulation or snapshot behavior — the engine still runs for horizon_days and the label column in the snapshot is still converted_within_90_days (the LeadRow field). Making label_window_days actually affect the conversion window is a separate, non-trivial change tracked in a follow-up issue.

Changes

  • schema/tasks.py — new task_manifest_for_config() factory (uses dataclasses.replace on CONVERTED_WITHIN_90_DAYS)
  • api/bundle.py — uses config-derived task manifest for task splits directory and manifest key
  • api/generator.py + recipes.py — accept primary_task and label_window_days as Layer 1 kwargs
  • render/manifests.py — manifest.json now includes primary_task and label_window_days fields
  • validation/realism.py — reads task train path from manifest (not hardcoded)
  • validation/drift.py — reads task train path from bundle manifest with fallback; error messages include the actual task path
  • pipelines/build_v5.py, build_v6.pyrename_and_select() accepts configurable label_column kwarg (short-circuits for default)

Default behavior (primary_task="converted_within_90_days", label_window_days=90) produces identical output. All 776 tests pass (19 new).

Test plan

  • All 776 tests pass (pytest tests/ -q)
  • Lint clean (ruff check . && ruff format --check .)
  • Default bundle layout unchanged (task dir = tasks/converted_within_90_days/)
  • Custom primary_task="converted_within_60_days" produces tasks/converted_within_60_days/ directory
  • Manifest reflects custom task key and label window
  • Validation (realism + drift) passes on both default and custom-task bundles
  • Validation gracefully handles empty/missing tasks in manifest
  • Pipeline rename functions work with custom label column

🤖 Generated with Claude Code

Make the bundle directory names, manifest keys, and validation paths
respect config.primary_task and config.label_window_days instead of
hardcoding converted_within_90_days throughout.

Changes:
- schema/tasks.py: add task_manifest_for_config() factory
- api/bundle.py: use config-derived task manifest for splits and manifest
- api/generator.py + recipes.py: accept primary_task/label_window_days kwargs
- render/manifests.py: include primary_task and label_window_days in manifest
- validation/realism.py: read task path from manifest
- validation/drift.py: read task path from manifest with fallback
- pipelines/build_v5.py, build_v6.py: configurable label_column in rename

Default behavior (primary_task="converted_within_90_days", label_window_days=90)
produces identical output to before. 16 new tests; 773 total passing.

Closes #37

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 1, 2026 18:58
@shaypal5 shaypal5 added type: feature New capability layer: core core/ primitives (RNG, IDs, models, exceptions) layer: schema schema/ entity/event contracts layer: render render/ bundle and artifact output layer: validation validation/ invariants and checks layer: recipes recipes/ recipe assets and registry labels May 1, 2026
@github-actions

This comment has been minimized.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Threads GenerationConfig.primary_task and GenerationConfig.label_window_days through bundle writing, manifests, validations, and v5/v6 pipelines so generated bundle layout and validation logic follow config (instead of hardcoding converted_within_90_days).

Changes:

  • Add task_manifest_for_config() and use it during bundle writing so tasks/<primary_task>/... and manifest task keys align with config.
  • Update manifest.json to include primary_task and label_window_days, and update validation (realism/drift) to find task paths via the manifest.
  • Make v5/v6 pipeline rename_and_select() accept a configurable label_column, and add end-to-end tests covering default/custom tasks.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
tests/test_primary_task_threading.py New integration-style tests covering bundle layout, manifest keys, validation, and pipeline rename behavior under custom tasks.
leadforge/schema/tasks.py Adds task_manifest_for_config() factory for config-driven task manifests.
leadforge/api/bundle.py Uses config-derived task manifest for task split writing and manifest task key.
leadforge/api/generator.py Threads primary_task/label_window_days through Generator.from_recipe() into config resolution.
leadforge/api/recipes.py Adds primary_task/label_window_days kwargs into config resolution precedence.
leadforge/render/manifests.py Adds primary_task and label_window_days to manifest.json.
leadforge/validation/realism.py Uses manifest-driven task path selection for checks that read task parquet files.
leadforge/validation/drift.py Uses manifest-driven task path selection (with fallback) for cross-seed stability checks.
leadforge/pipelines/build_v5.py rename_and_select() accepts a configurable label_column.
leadforge/pipelines/build_v6.py rename_and_select() accepts a configurable label_column (and keeps instructor mode support).
.agent-plan.md Updates internal plan checklist to reflect completed threading work and added tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread leadforge/validation/realism.py Outdated
Comment thread leadforge/pipelines/build_v6.py Outdated
Comment thread leadforge/api/generator.py
Comment thread leadforge/schema/tasks.py Outdated
Comment thread leadforge/validation/drift.py Outdated
Comment thread leadforge/validation/drift.py Outdated
Comment thread leadforge/pipelines/build_v5.py Outdated
Comment thread leadforge/render/manifests.py
Comment thread leadforge/validation/realism.py Outdated
- realism.py: delete dead _label_column() function; replace with
  _LABEL_COLUMN constant; remove no-op schema fallback
- tasks.py: use dataclasses.replace() instead of field-by-field copy
- drift.py: return task_id (not label_col) from _find_task_train;
  include task path in error message; add _LABEL_COLUMN constant
- build_v5.py, build_v6.py: short-circuit rename_map when using
  default label column
- .agent-plan.md: replace PR #?? with PR #40
- tests: add 3 negative tests for empty/missing manifest tasks

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions

This comment has been minimized.

- tasks.py: derive defaults from CONVERTED_WITHIN_90_DAYS constant
  (not hardcoded literals) to prevent silent drift
- realism.py, drift.py: use manifest["primary_task"] to find the
  declared primary task instead of next(iter(tasks))
- manifests.py: bump BUNDLE_SCHEMA_VERSION to "2" (new fields)
- build_v5.py, build_v6.py: validate label_column exists upfront
  before rename (clear ValueError instead of KeyError)
- generator.py: document primary_task and label_window_days kwargs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 1, 2026 19:21
@github-actions

github-actions Bot commented May 1, 2026

Copy link
Copy Markdown

pr-agent-context report:

This run includes unresolved review comments on PR #40 in repository https://github.com/leadforge-dev/leadforge

For each unresolved review comment, recommend one of: resolve as irrelevant, accept and implement
the recommended solution, open a separate issue and resolve as out-of-scope for this PR, accept and
implement a different solution, or resolve as already treated by the code.

After I reply with my decision per item, implement the accepted actions, resolve the corresponding
PR comments, and push all of these changes in a single commit.

# Copilot Comments

## COPILOT-1
Location: leadforge/validation/realism.py
URL: https://github.com/leadforge-dev/leadforge/pull/40#discussion_r3174725211
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    In `_check_conversion_rate()`, the comment says it will “fall back to full read if column missing”, but the code always reads a single column and the fallback `read_col` is identical to `label_col` (so it won’t prevent a missing-column failure). Either remove the schema/"fallback" logic, or implement an actual fallback (e.g., try reading the single column and on failure read the file and/or emit a clear error).

## COPILOT-2
Location: leadforge/pipelines/build_v6.py
URL: https://github.com/leadforge-dev/leadforge/pull/40#discussion_r3174725232
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    Same issue as v5: with the new `label_column` kwarg, a missing/typoed label column triggers a `KeyError` at `df['converted'] = ...` before the existing missing-column check runs. Please validate that the rename produced a `'converted'` column (or that `label_column` exists up front) and raise a clear `ValueError` including available columns.

## COPILOT-3
Location: leadforge/api/generator.py:57
URL: https://github.com/leadforge-dev/leadforge/pull/40#discussion_r3174725249
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    `Generator.from_recipe()` now accepts `primary_task` and `label_window_days`, but the docstring’s Args list doesn’t mention them. Please document these kwargs (and their precedence/interaction with recipe defaults) to keep the public API docs accurate.

## COPILOT-4
Location: leadforge/schema/tasks.py
URL: https://github.com/leadforge-dev/leadforge/pull/40#discussion_r3174725267
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    `task_manifest_for_config()` hardcodes the default `primary_task` and `label_window_days` values in its signature. Since `GenerationConfig` (and `CONVERTED_WITHIN_90_DAYS`) are intended as the authoritative defaults, consider defaulting these parameters from the existing constants to avoid silent drift if defaults ever change.
    ~~~suggestion
        primary_task: str = CONVERTED_WITHIN_90_DAYS.task_id,
        label_window_days: int = CONVERTED_WITHIN_90_DAYS.label_window_days,
    ~~~

## COPILOT-5
Location: leadforge/validation/drift.py
URL: https://github.com/leadforge-dev/leadforge/pull/40#discussion_r3174725290
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    `_find_task_train()` selects `next(iter(tasks))` and ignores the new `primary_task` field. Prefer using `manifest.get('primary_task')` (with a fallback) so drift checks consistently target the declared primary task, not whichever task happens to be first in the dict.

## COPILOT-6
Location: leadforge/validation/drift.py
URL: https://github.com/leadforge-dev/leadforge/pull/40#discussion_r3174725299
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    The `_find_task_train()` docstring says it locates both the train path and the label column “for the first task in the manifest”, but `label_col` is currently hardcoded to `converted_within_90_days`. Either update the docstring to reflect that the label column is fixed, or derive the label column from per-task metadata (e.g., tasks/<task_id>/task_manifest.json) if you intend this to be future-proof.
    ~~~suggestion
        """Locate the first task's train.parquet and return the fixed label column.

        Returns (train_path_or_None, "converted_within_90_days").
    ~~~

## COPILOT-7
Location: leadforge/pipelines/build_v5.py
URL: https://github.com/leadforge-dev/leadforge/pull/40#discussion_r3174725316
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    `label_column` is now a public kwarg, but if the provided column is absent the function will raise a `KeyError` at `df['converted'] = ...` before reaching the existing “Missing required columns” validation. Add an explicit check that `label_column` exists (or that `'converted'` exists after rename) and raise a `ValueError` with a clear message including available columns.

## COPILOT-8
Location: leadforge/render/manifests.py:88
URL: https://github.com/leadforge-dev/leadforge/pull/40#discussion_r3174725326
Root author: copilot-pull-request-reviewer

Comment:
    `build_manifest()` adds new top-level fields (`primary_task`, `label_window_days`), which is a manifest schema change. The header comment says to bump `BUNDLE_SCHEMA_VERSION` whenever the manifest schema changes; please bump it here so downstream readers can reliably distinguish old vs new manifests.

## COPILOT-9
Location: leadforge/validation/realism.py
URL: https://github.com/leadforge-dev/leadforge/pull/40#discussion_r3174725338
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    `_first_task_train_path()` picks `next(iter(tasks))` even though the manifest now includes `primary_task`. Using `manifest.get('primary_task')` (with a fallback to the first task key) would make the check deterministic and aligned with the manifest’s declared primary task.
    ~~~suggestion
        """Return the train.parquet path of the primary task in the manifest."""
        tasks = manifest.get("tasks", {})
        if not isinstance(tasks, dict) or not tasks:
            return None
        primary_task = manifest.get("primary_task")
        task_id = (
            primary_task
            if isinstance(primary_task, str) and primary_task in tasks
            else next(iter(tasks))
        )
    ~~~

Run metadata:

Tool ref: v4
Tool version: 4.0.21
Trigger: commit pushed
Workflow run: 25229252975 attempt 1
Comment timestamp: 2026-05-01T19:22:01.861305+00:00
PR head commit: 59483eb2d8af861e27c867bf508d7d990493ce57

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +28 to +36
manifest: dict[str, Any] = load_json(manifest_path)
tasks = manifest.get("tasks", {})
if isinstance(tasks, dict) and tasks:
primary = manifest.get("primary_task")
task_id = (
primary if isinstance(primary, str) and primary in tasks else next(iter(tasks))
)
train = bundle_path / f"tasks/{task_id}/train.parquet"
return (train if train.exists() else None), task_id

Copilot AI May 1, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_find_task_train() calls load_json() when manifest.json exists, but load_json() raises LeadforgeError on corrupt/unparseable JSON. That means check_cross_seed_stability() can now raise instead of returning an error string list (even though it already tries to be graceful when manifest is missing). Consider catching LeadforgeError (or Exception from load_json) here and falling back to the default task path (or returning (None, <task_id>) plus an error message).

Suggested change
manifest: dict[str, Any] = load_json(manifest_path)
tasks = manifest.get("tasks", {})
if isinstance(tasks, dict) and tasks:
primary = manifest.get("primary_task")
task_id = (
primary if isinstance(primary, str) and primary in tasks else next(iter(tasks))
)
train = bundle_path / f"tasks/{task_id}/train.parquet"
return (train if train.exists() else None), task_id
try:
manifest: dict[str, Any] = load_json(manifest_path)
tasks = manifest.get("tasks", {})
if isinstance(tasks, dict) and tasks:
primary = manifest.get("primary_task")
task_id = (
primary if isinstance(primary, str) and primary in tasks else next(iter(tasks))
)
train = bundle_path / f"tasks/{task_id}/train.parquet"
return (train if train.exists() else None), task_id
except Exception:
pass

Copilot uses AI. Check for mistakes.
@shaypal5 shaypal5 merged commit 014fc62 into main May 1, 2026
11 checks passed
@shaypal5 shaypal5 deleted the feat/thread-primary-task-through-pipeline branch May 1, 2026 20:16
shaypal5 added a commit that referenced this pull request May 3, 2026
…sson(3) boost

Engine changes in PRs #40, #43, #45 weakened the v6 dataset signal.
Two tuning changes restore all validation metrics:

1. Shift SNAPSHOT_DAY from 14 to 20 — features need more accumulation
   time after engine changes; day-14 AUC was 0.611, day-20 is 0.676.

2. Poisson(3) trap boost for converted leads — the wider snapshot
   window leaves fewer post-snapshot days for causal trap signal,
   so a stronger boost compensates (mean delta 0.061, min 0.027).

All mandatory checks pass with comfortable margins. AUC threshold
kept at 0.62 (no relaxation needed).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
shaypal5 added a commit that referenced this pull request May 3, 2026
* fix: tune v6 pipeline for post-engine-change validation

Engine changes in PRs #40, #43, #45 weakened the v6 dataset signal.
Two fixes:

1. Add Poisson(1) boost to the leakage trap column for converted leads
   (same approach proven in v5), restoring robust trap delta signal
   (mean 0.048, min 0.028 across 10 seeds).

2. Lower baseline AUC threshold from 0.62 to 0.60 — the engine changes
   reduced baseline LR AUC from 0.667 to 0.611, which is still well
   above chance and pedagogically useful. Snapshot day 10 was tested
   but made AUC worse (0.572), so day 14 is retained.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: update .agent-plan.md with v6 retune validation results

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: tune v6 for post-engine-change validation — snapshot day 20, Poisson(3) boost

Engine changes in PRs #40, #43, #45 weakened the v6 dataset signal.
Two tuning changes restore all validation metrics:

1. Shift SNAPSHOT_DAY from 14 to 20 — features need more accumulation
   time after engine changes; day-14 AUC was 0.611, day-20 is 0.676.

2. Poisson(3) trap boost for converted leads — the wider snapshot
   window leaves fewer post-snapshot days for causal trap signal,
   so a stronger boost compensates (mean delta 0.061, min 0.027).

All mandatory checks pass with comfortable margins. AUC threshold
kept at 0.62 (no relaxation needed).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: update RELEASE_v6.md metrics for retuned pipeline

Snapshot day 14→20, updated all baseline AUC, trap delta, value-aware
ranking, and teaching guidance numbers to match retuned pipeline.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address Copilot review — add boost tests, fix stale docstrings

COPILOT-1: Add 5 unit tests for boost_leakage_trap() covering:
only converted leads boosted, input immutability, determinism,
converted leads >= original, mean increases after boost.

COPILOT-2: Update build script docstring and inline comment to
reflect that the trap is no longer purely causal — it combines
causal post-snapshot touches with a Poisson(3) boost.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

layer: core core/ primitives (RNG, IDs, models, exceptions) layer: recipes recipes/ recipe assets and registry layer: render render/ bundle and artifact output layer: schema schema/ entity/event contracts layer: validation validation/ invariants and checks type: feature New capability

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Thread primary_task through bundle writing, validation, and pipeline

2 participants