feat: generalize dataset card prose for non-conversion tasks#42
Conversation
…#38) Add label_description field to TaskManifest so the dataset card's primary task section uses task-specific prose instead of hardcoded conversion language. Thread TaskManifest from write_bundle() to render_dataset_card(). Default case (converted_within_90_days) produces equivalent output. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Pull request overview
This PR generalizes the dataset card’s “Primary task” prose so it’s no longer hardcoded to conversion/closed_won, enabling non-conversion primary tasks to provide accurate, task-specific label definitions via TaskManifest.
Changes:
- Add
label_descriptiontoTaskManifestand include it inTaskManifest.to_dict(). - Extend
render_dataset_card()to accept an optionaltask_manifestand use itslabel_descriptionwhen present. - Thread the
TaskManifestcreated inwrite_bundle()through torender_dataset_card(), and add tests covering custom/default/fallback behavior.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
leadforge/schema/tasks.py |
Adds label_description to TaskManifest and populates it for default task + factory. |
leadforge/narrative/dataset_card.py |
Accepts optional task_manifest and uses it to render task-specific label definition prose. |
leadforge/api/bundle.py |
Passes the constructed TaskManifest into render_dataset_card(). |
tests/schema/test_tasks.py |
Updates TaskManifest.to_dict() key expectations to include label_description. |
tests/narrative/test_dataset_card.py |
Adds coverage ensuring label prose selection behaves as intended. |
.agent-plan.md |
Tracks the work items and test additions for this PR. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
shaypal5
left a comment
There was a problem hiding this comment.
Self-review — senior dev hat on
This PR moves in the right direction but has several design issues that should be addressed before merge.
1. label_description is near-duplicate of description
TaskManifest already has a description field containing a full human-readable sentence. Now we add label_description — a second prose field that says almost the same thing in slightly different words. Compare:
description: "Predict whether a lead converts (closed_won event) within 90 days of the snapshot anchor date…"label_description: "A lead is considered converted if aclosed_wonevent is recorded within 90 days of the lead's snapshot anchor date…"
These are the same information rephrased. Rather than adding a new field, the dataset card should just use description directly (it's already there and already task-specific). If the phrasing needs to differ for the card context, adjust description itself or format it differently at the render site — don't carry two parallel prose strings that will inevitably drift.
2. task_manifest_for_config() generates conversion-specific prose for any task name
The factory function hardcodes "A lead is considered converted if a closed_won event…" regardless of what primary_task is. Call task_manifest_for_config("churned_within_60_days", 60) and you get label_description talking about closed_won conversion. The whole point of this issue was to stop hardcoding conversion prose — but the factory still does it. The only way to get correct prose is to construct a TaskManifest manually, which defeats the factory's purpose.
3. Conversion-specific fallback in render_dataset_card is a trap
When task_manifest is None or has empty label_description, the renderer falls back to hardcoded conversion prose using cfg.label_window_days. This means:
- A caller using a non-conversion task who forgets to pass
task_manifestgets silently wrong prose (talks aboutclosed_won). - If
task_manifest.label_window_daysdiffers fromcfg.label_window_days(unlikely but possible), the fallback uses the wrong window value (Copilot caught this one too).
The fallback should either be task-agnostic or should not exist — if we require a TaskManifest, just require it.
4. label_description leaks into task_manifest.json via to_dict()
This is a presentation-layer concern (how to phrase the dataset card) being serialized into the data schema (task_manifest.json). Downstream consumers parsing the manifest now see a new field that's purely about rendering prose. This should either stay out of to_dict() or be a conscious schema decision documented in the manifest spec.
5. Missing docstring updates (Copilot caught this)
TaskManifestclass docstring doesn't listlabel_descriptionin its Attributes section.render_dataset_card()docstring doesn't describe the newtask_manifestparameter or its precedence logic.
Suggested fix
Drop label_description entirely. Instead:
- Use
TaskManifest.descriptiondirectly in the dataset card as the label definition text. - Make
render_dataset_cardrequiretask_manifest(or keep it optional with a truly generic fallback that doesn't mentionclosed_won). - Adjust the
descriptiontext onCONVERTED_WITHIN_90_DAYSif needed so it reads well in the card context. - Fix
task_manifest_for_config()so it produces a generic description when the task name isn't the default conversion task.
This comment has been minimized.
This comment has been minimized.
… card Address self-review findings: - Remove label_description field (near-duplicate of description) - Use TaskManifest.description directly as dataset card label definition - Make task_manifest_for_config() produce generic prose for non-default tasks instead of hardcoding conversion-specific language for all tasks - Use task-agnostic fallback when no TaskManifest is provided - Update docstrings for TaskManifest and render_dataset_card() Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
pr-agent-context report: This run includes unresolved review comments on PR #42 in repository https://github.com/leadforge-dev/leadforge
For each unresolved review comment, recommend one of: resolve as irrelevant, accept and implement
the recommended solution, open a separate issue and resolve as out-of-scope for this PR, accept and
implement a different solution, or resolve as already treated by the code.
After I reply with my decision per item, implement the accepted actions, resolve the corresponding
PR comments, and push all of these changes in a single commit.
# Copilot Comments
## COPILOT-1
Location: leadforge/schema/tasks.py
URL: https://github.com/leadforge-dev/leadforge/pull/42#discussion_r3175052483
Status: outdated
Root author: copilot-pull-request-reviewer
Comment:
`TaskManifest` gained the `label_description` field, but the class docstring’s Attributes list doesn’t mention it (and `label_window_days` is still described in conversion-specific terms). Please update the docstring so the public schema docs match the dataclass fields and are task-agnostic.
## COPILOT-2
Location: leadforge/narrative/dataset_card.py:21
URL: https://github.com/leadforge-dev/leadforge/pull/42#discussion_r3175052506
Root author: copilot-pull-request-reviewer
Comment:
`render_dataset_card()` now accepts an optional `task_manifest`, but the docstring doesn’t describe this parameter or the precedence rules (use `task_manifest.label_description` when non-empty, otherwise fall back). Please document the new argument so callers know how to use it correctly.
## COPILOT-3
Location: leadforge/narrative/dataset_card.py
URL: https://github.com/leadforge-dev/leadforge/pull/42#discussion_r3175052521
Status: outdated
Root author: copilot-pull-request-reviewer
Comment:
When a `task_manifest` is passed but `label_description` is empty, the fallback prose uses `cfg.label_window_days`. This can produce an inconsistent dataset card if the manifest’s window differs from the config. Consider building the fallback using `task_manifest.label_window_days` (and potentially `task_manifest.task_id`) whenever a manifest is provided, reserving `cfg.*` only for the `task_manifest is None` case.Run metadata: |
Closes #38
Summary
label_descriptionfield toTaskManifest— human-readable prose for the dataset card's primary task sectionrender_dataset_card()now accepts an optionaltask_manifestkwarg; useslabel_descriptionwhen provided, falls back to hardcoded conversion prose otherwisewrite_bundle()threads theTaskManifest(already constructed for task splits) torender_dataset_card()converted_within_90_days) produces equivalent output to pre-change behaviorTest plan
🤖 Generated with Claude Code