feat: generalize dataset card prose for non-conversion tasks by shaypal5 · Pull Request #42 · leadforge-dev/leadforge

shaypal5 · 2026-05-01T20:22:35Z

Closes #38

Summary

Add label_description field to TaskManifest — human-readable prose for the dataset card's primary task section
render_dataset_card() now accepts an optional task_manifest kwarg; uses label_description when provided, falls back to hardcoded conversion prose otherwise
write_bundle() threads the TaskManifest (already constructed for task splits) to render_dataset_card()
Default case (converted_within_90_days) produces equivalent output to pre-change behavior

Test plan

4 new tests: custom label description, default equivalence, no-manifest fallback, empty-description fallback
All 780 tests pass
Lint clean

🤖 Generated with Claude Code

…#38) Add label_description field to TaskManifest so the dataset card's primary task section uses task-specific prose instead of hardcoded conversion language. Thread TaskManifest from write_bundle() to render_dataset_card(). Default case (converted_within_90_days) produces equivalent output. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

This PR generalizes the dataset card’s “Primary task” prose so it’s no longer hardcoded to conversion/closed_won, enabling non-conversion primary tasks to provide accurate, task-specific label definitions via TaskManifest.

Changes:

Add label_description to TaskManifest and include it in TaskManifest.to_dict().
Extend render_dataset_card() to accept an optional task_manifest and use its label_description when present.
Thread the TaskManifest created in write_bundle() through to render_dataset_card(), and add tests covering custom/default/fallback behavior.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`leadforge/schema/tasks.py`	Adds `label_description` to `TaskManifest` and populates it for default task + factory.
`leadforge/narrative/dataset_card.py`	Accepts optional `task_manifest` and uses it to render task-specific label definition prose.
`leadforge/api/bundle.py`	Passes the constructed `TaskManifest` into `render_dataset_card()`.
`tests/schema/test_tasks.py`	Updates `TaskManifest.to_dict()` key expectations to include `label_description`.
`tests/narrative/test_dataset_card.py`	Adds coverage ensuring label prose selection behaves as intended.
`.agent-plan.md`	Tracks the work items and test additions for this PR.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

shaypal5

Self-review — senior dev hat on

This PR moves in the right direction but has several design issues that should be addressed before merge.

1. `label_description` is near-duplicate of `description`

TaskManifest already has a description field containing a full human-readable sentence. Now we add label_description — a second prose field that says almost the same thing in slightly different words. Compare:

description: "Predict whether a lead converts (closed_won event) within 90 days of the snapshot anchor date…"
label_description: "A lead is considered converted if a closed_won event is recorded within 90 days of the lead's snapshot anchor date…"

These are the same information rephrased. Rather than adding a new field, the dataset card should just use description directly (it's already there and already task-specific). If the phrasing needs to differ for the card context, adjust description itself or format it differently at the render site — don't carry two parallel prose strings that will inevitably drift.

2. `task_manifest_for_config()` generates conversion-specific prose for any task name

The factory function hardcodes "A lead is considered converted if a closed_won event…" regardless of what primary_task is. Call task_manifest_for_config("churned_within_60_days", 60) and you get label_description talking about closed_won conversion. The whole point of this issue was to stop hardcoding conversion prose — but the factory still does it. The only way to get correct prose is to construct a TaskManifest manually, which defeats the factory's purpose.

3. Conversion-specific fallback in `render_dataset_card` is a trap

When task_manifest is None or has empty label_description, the renderer falls back to hardcoded conversion prose using cfg.label_window_days. This means:

A caller using a non-conversion task who forgets to pass task_manifest gets silently wrong prose (talks about closed_won).
If task_manifest.label_window_days differs from cfg.label_window_days (unlikely but possible), the fallback uses the wrong window value (Copilot caught this one too).

The fallback should either be task-agnostic or should not exist — if we require a TaskManifest, just require it.

4. `label_description` leaks into `task_manifest.json` via `to_dict()`

This is a presentation-layer concern (how to phrase the dataset card) being serialized into the data schema (task_manifest.json). Downstream consumers parsing the manifest now see a new field that's purely about rendering prose. This should either stay out of to_dict() or be a conscious schema decision documented in the manifest spec.

5. Missing docstring updates (Copilot caught this)

TaskManifest class docstring doesn't list label_description in its Attributes section.
render_dataset_card() docstring doesn't describe the new task_manifest parameter or its precedence logic.

Suggested fix

Drop label_description entirely. Instead:

Use TaskManifest.description directly in the dataset card as the label definition text.
Make render_dataset_card require task_manifest (or keep it optional with a truly generic fallback that doesn't mention closed_won).
Adjust the description text on CONVERTED_WITHIN_90_DAYS if needed so it reads well in the card context.
Fix task_manifest_for_config() so it produces a generic description when the task name isn't the default conversion task.

… card Address self-review findings: - Remove label_description field (near-duplicate of description) - Use TaskManifest.description directly as dataset card label definition - Make task_manifest_for_config() produce generic prose for non-default tasks instead of hardcoding conversion-specific language for all tasks - Use task-agnostic fallback when no TaskManifest is provided - Update docstrings for TaskManifest and render_dataset_card() Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-05-01T20:37:44Z

pr-agent-context report:

This run includes unresolved review comments on PR #42 in repository https://github.com/leadforge-dev/leadforge

For each unresolved review comment, recommend one of: resolve as irrelevant, accept and implement
the recommended solution, open a separate issue and resolve as out-of-scope for this PR, accept and
implement a different solution, or resolve as already treated by the code.

After I reply with my decision per item, implement the accepted actions, resolve the corresponding
PR comments, and push all of these changes in a single commit.

# Copilot Comments

## COPILOT-1
Location: leadforge/schema/tasks.py
URL: https://github.com/leadforge-dev/leadforge/pull/42#discussion_r3175052483
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    `TaskManifest` gained the `label_description` field, but the class docstring’s Attributes list doesn’t mention it (and `label_window_days` is still described in conversion-specific terms). Please update the docstring so the public schema docs match the dataclass fields and are task-agnostic.

## COPILOT-2
Location: leadforge/narrative/dataset_card.py:21
URL: https://github.com/leadforge-dev/leadforge/pull/42#discussion_r3175052506
Root author: copilot-pull-request-reviewer

Comment:
    `render_dataset_card()` now accepts an optional `task_manifest`, but the docstring doesn’t describe this parameter or the precedence rules (use `task_manifest.label_description` when non-empty, otherwise fall back). Please document the new argument so callers know how to use it correctly.

## COPILOT-3
Location: leadforge/narrative/dataset_card.py
URL: https://github.com/leadforge-dev/leadforge/pull/42#discussion_r3175052521
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    When a `task_manifest` is passed but `label_description` is empty, the fallback prose uses `cfg.label_window_days`. This can produce an inconsistent dataset card if the manifest’s window differs from the config. Consider building the fallback using `task_manifest.label_window_days` (and potentially `task_manifest.task_id`) whenever a manifest is provided, reserving `cfg.*` only for the `task_manifest is None` case.

Run metadata:

Tool ref: v4
Tool version: 4.0.21
Trigger: commit pushed
Workflow run: 25232077307 attempt 1
Comment timestamp: 2026-05-01T20:36:57.467272+00:00
PR head commit: d5f19ba461e2cf40a028698e8ab0d68bc7ad63a3

Copilot AI review requested due to automatic review settings May 1, 2026 20:22

shaypal5 added type: feature New capability layer: narrative narrative/ vertical story layer labels May 1, 2026

shaypal5 added this to the v0.3.0 — Motif variability + exposure modes milestone May 1, 2026

Copilot started reviewing on behalf of shaypal5 May 1, 2026 20:23 View session

This comment has been minimized.

Sign in to view

Copilot AI reviewed May 1, 2026

View reviewed changes

Comment thread leadforge/schema/tasks.py Outdated

Comment thread leadforge/narrative/dataset_card.py

Comment thread leadforge/narrative/dataset_card.py Outdated

shaypal5 commented May 1, 2026

View reviewed changes

This comment has been minimized.

Sign in to view

shaypal5 merged commit a40b3e5 into main May 1, 2026
7 checks passed

shaypal5 deleted the 38-generalize-dataset-card-prose branch May 1, 2026 20:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: generalize dataset card prose for non-conversion tasks#42

feat: generalize dataset card prose for non-conversion tasks#42
shaypal5 merged 2 commits into
mainfrom
38-generalize-dataset-card-prose

shaypal5 commented May 1, 2026

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shaypal5 left a comment

Uh oh!

This comment has been minimized.

github-actions Bot commented May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shaypal5 commented May 1, 2026

Summary

Test plan

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shaypal5 left a comment

Choose a reason for hiding this comment

Self-review — senior dev hat on

1. label_description is near-duplicate of description

2. task_manifest_for_config() generates conversion-specific prose for any task name

3. Conversion-specific fallback in render_dataset_card is a trap

4. label_description leaks into task_manifest.json via to_dict()

5. Missing docstring updates (Copilot caught this)

Suggested fix

Uh oh!

This comment has been minimized.

github-actions Bot commented May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. `label_description` is near-duplicate of `description`

2. `task_manifest_for_config()` generates conversion-specific prose for any task name

3. Conversion-specific fallback in `render_dataset_card` is a trap

4. `label_description` leaks into `task_manifest.json` via `to_dict()`