Skip to content

feat: generalize dataset card prose for non-conversion tasks#42

Merged
shaypal5 merged 2 commits into
mainfrom
38-generalize-dataset-card-prose
May 1, 2026
Merged

feat: generalize dataset card prose for non-conversion tasks#42
shaypal5 merged 2 commits into
mainfrom
38-generalize-dataset-card-prose

Conversation

@shaypal5

@shaypal5 shaypal5 commented May 1, 2026

Copy link
Copy Markdown
Contributor

Closes #38

Summary

  • Add label_description field to TaskManifest — human-readable prose for the dataset card's primary task section
  • render_dataset_card() now accepts an optional task_manifest kwarg; uses label_description when provided, falls back to hardcoded conversion prose otherwise
  • write_bundle() threads the TaskManifest (already constructed for task splits) to render_dataset_card()
  • Default case (converted_within_90_days) produces equivalent output to pre-change behavior

Test plan

  • 4 new tests: custom label description, default equivalence, no-manifest fallback, empty-description fallback
  • All 780 tests pass
  • Lint clean

🤖 Generated with Claude Code

…#38)

Add label_description field to TaskManifest so the dataset card's primary
task section uses task-specific prose instead of hardcoded conversion
language. Thread TaskManifest from write_bundle() to render_dataset_card().
Default case (converted_within_90_days) produces equivalent output.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 1, 2026 20:22
@shaypal5 shaypal5 added type: feature New capability layer: narrative narrative/ vertical story layer labels May 1, 2026
@github-actions

This comment has been minimized.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR generalizes the dataset card’s “Primary task” prose so it’s no longer hardcoded to conversion/closed_won, enabling non-conversion primary tasks to provide accurate, task-specific label definitions via TaskManifest.

Changes:

  • Add label_description to TaskManifest and include it in TaskManifest.to_dict().
  • Extend render_dataset_card() to accept an optional task_manifest and use its label_description when present.
  • Thread the TaskManifest created in write_bundle() through to render_dataset_card(), and add tests covering custom/default/fallback behavior.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
leadforge/schema/tasks.py Adds label_description to TaskManifest and populates it for default task + factory.
leadforge/narrative/dataset_card.py Accepts optional task_manifest and uses it to render task-specific label definition prose.
leadforge/api/bundle.py Passes the constructed TaskManifest into render_dataset_card().
tests/schema/test_tasks.py Updates TaskManifest.to_dict() key expectations to include label_description.
tests/narrative/test_dataset_card.py Adds coverage ensuring label prose selection behaves as intended.
.agent-plan.md Tracks the work items and test additions for this PR.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread leadforge/schema/tasks.py Outdated
Comment thread leadforge/narrative/dataset_card.py
Comment thread leadforge/narrative/dataset_card.py Outdated

@shaypal5 shaypal5 left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self-review — senior dev hat on

This PR moves in the right direction but has several design issues that should be addressed before merge.

1. label_description is near-duplicate of description

TaskManifest already has a description field containing a full human-readable sentence. Now we add label_description — a second prose field that says almost the same thing in slightly different words. Compare:

  • description: "Predict whether a lead converts (closed_won event) within 90 days of the snapshot anchor date…"
  • label_description: "A lead is considered converted if a closed_won event is recorded within 90 days of the lead's snapshot anchor date…"

These are the same information rephrased. Rather than adding a new field, the dataset card should just use description directly (it's already there and already task-specific). If the phrasing needs to differ for the card context, adjust description itself or format it differently at the render site — don't carry two parallel prose strings that will inevitably drift.

2. task_manifest_for_config() generates conversion-specific prose for any task name

The factory function hardcodes "A lead is considered converted if a closed_won event…" regardless of what primary_task is. Call task_manifest_for_config("churned_within_60_days", 60) and you get label_description talking about closed_won conversion. The whole point of this issue was to stop hardcoding conversion prose — but the factory still does it. The only way to get correct prose is to construct a TaskManifest manually, which defeats the factory's purpose.

3. Conversion-specific fallback in render_dataset_card is a trap

When task_manifest is None or has empty label_description, the renderer falls back to hardcoded conversion prose using cfg.label_window_days. This means:

  • A caller using a non-conversion task who forgets to pass task_manifest gets silently wrong prose (talks about closed_won).
  • If task_manifest.label_window_days differs from cfg.label_window_days (unlikely but possible), the fallback uses the wrong window value (Copilot caught this one too).

The fallback should either be task-agnostic or should not exist — if we require a TaskManifest, just require it.

4. label_description leaks into task_manifest.json via to_dict()

This is a presentation-layer concern (how to phrase the dataset card) being serialized into the data schema (task_manifest.json). Downstream consumers parsing the manifest now see a new field that's purely about rendering prose. This should either stay out of to_dict() or be a conscious schema decision documented in the manifest spec.

5. Missing docstring updates (Copilot caught this)

  • TaskManifest class docstring doesn't list label_description in its Attributes section.
  • render_dataset_card() docstring doesn't describe the new task_manifest parameter or its precedence logic.

Suggested fix

Drop label_description entirely. Instead:

  1. Use TaskManifest.description directly in the dataset card as the label definition text.
  2. Make render_dataset_card require task_manifest (or keep it optional with a truly generic fallback that doesn't mention closed_won).
  3. Adjust the description text on CONVERTED_WITHIN_90_DAYS if needed so it reads well in the card context.
  4. Fix task_manifest_for_config() so it produces a generic description when the task name isn't the default conversion task.

@github-actions

This comment has been minimized.

… card

Address self-review findings:
- Remove label_description field (near-duplicate of description)
- Use TaskManifest.description directly as dataset card label definition
- Make task_manifest_for_config() produce generic prose for non-default tasks
  instead of hardcoding conversion-specific language for all tasks
- Use task-agnostic fallback when no TaskManifest is provided
- Update docstrings for TaskManifest and render_dataset_card()

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions

github-actions Bot commented May 1, 2026

Copy link
Copy Markdown

pr-agent-context report:

This run includes unresolved review comments on PR #42 in repository https://github.com/leadforge-dev/leadforge

For each unresolved review comment, recommend one of: resolve as irrelevant, accept and implement
the recommended solution, open a separate issue and resolve as out-of-scope for this PR, accept and
implement a different solution, or resolve as already treated by the code.

After I reply with my decision per item, implement the accepted actions, resolve the corresponding
PR comments, and push all of these changes in a single commit.

# Copilot Comments

## COPILOT-1
Location: leadforge/schema/tasks.py
URL: https://github.com/leadforge-dev/leadforge/pull/42#discussion_r3175052483
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    `TaskManifest` gained the `label_description` field, but the class docstring’s Attributes list doesn’t mention it (and `label_window_days` is still described in conversion-specific terms). Please update the docstring so the public schema docs match the dataclass fields and are task-agnostic.

## COPILOT-2
Location: leadforge/narrative/dataset_card.py:21
URL: https://github.com/leadforge-dev/leadforge/pull/42#discussion_r3175052506
Root author: copilot-pull-request-reviewer

Comment:
    `render_dataset_card()` now accepts an optional `task_manifest`, but the docstring doesn’t describe this parameter or the precedence rules (use `task_manifest.label_description` when non-empty, otherwise fall back). Please document the new argument so callers know how to use it correctly.

## COPILOT-3
Location: leadforge/narrative/dataset_card.py
URL: https://github.com/leadforge-dev/leadforge/pull/42#discussion_r3175052521
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    When a `task_manifest` is passed but `label_description` is empty, the fallback prose uses `cfg.label_window_days`. This can produce an inconsistent dataset card if the manifest’s window differs from the config. Consider building the fallback using `task_manifest.label_window_days` (and potentially `task_manifest.task_id`) whenever a manifest is provided, reserving `cfg.*` only for the `task_manifest is None` case.

Run metadata:

Tool ref: v4
Tool version: 4.0.21
Trigger: commit pushed
Workflow run: 25232077307 attempt 1
Comment timestamp: 2026-05-01T20:36:57.467272+00:00
PR head commit: d5f19ba461e2cf40a028698e8ab0d68bc7ad63a3

@shaypal5 shaypal5 merged commit a40b3e5 into main May 1, 2026
7 checks passed
@shaypal5 shaypal5 deleted the 38-generalize-dataset-card-prose branch May 1, 2026 20:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

layer: narrative narrative/ vertical story layer type: feature New capability

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Generalize dataset card template prose for non-conversion tasks

2 participants