diff --git a/.agent-plan.md b/.agent-plan.md
index 3c29ba0..89a3dfe 100644
--- a/.agent-plan.md
+++ b/.agent-plan.md
@@ -46,10 +46,12 @@ Goal: ship a best-in-class educational synthetic CRM lead-scoring dataset family
 - [x] PR 4.1: `release/README.md` (substantial rewrite) — release-grade dataset card per Datasheets-for-Datasets / Data Cards Playbook checklist (G10.1). New sections: macro framing paragraph (2024–2026 SaaS context, recommendation #19), simulation simplifications (modelled / approximate / not modelled, per chatgpt v2 §2.6), calibration documentation linking to `release/validation/validation_report.md`, public-vs-instructor redaction policy with concrete column lists citing `BANNED_LEAD_COLUMNS` / `BANNED_OPP_COLUMNS` / `BANNED_TABLES` / `SNAPSHOT_FILTERED_TABLES` from `leadforge/validation/leakage_probes.py`, intended-use vs out-of-scope-use, known limitations (G7.4.4 GBM−LR sign finding, weak channel signal from the Phase 4 audit, flat AUC across tiers, small cohort-shift gap), composition section per Datasheets format, adversarial-framing pointer (placeholder link to `docs/release/break_me_guide.md` that lands in PR 6.3), and a maintenance plan. Every realism / calibration / difficulty claim in the card is anchored to `validation_report.md` per G10.6. `BUNDLE_SCHEMA_VERSION` unchanged at 5 (documentation-only PR); 1167/1167 tests pass; ruff + mypy clean; `scripts/probe_relational_leakage.py release/{intro,intermediate,advanced} --max-accuracy 0.65` exits 0 on every public tier; `scripts/verify_hash_determinism.py` PASS 67/67; `scripts/validate_release_candidate.py --no-rebuild` exits 0.
 
 ### Phase 5 — Platform packaging
-- [ ] `scripts/package_kaggle_release.py` → `release/kaggle/dataset-metadata.json`
-- [ ] `scripts/package_hf_release.py` → `release/huggingface/README.md` with YAML configs/default/pretty_name/tags
-- [ ] `release/dataset-cover-image.png` (≥560×280)
-- [ ] Local `load_dataset()` smoke test; Kaggle dry-run package validation
+- [x] PR 5.1: `scripts/package_kaggle_release.py` (new) — Kaggle release packager. Reads each public tier's `manifest.json` + `feature_dictionary.csv` + flat CSV header under `release/`, emits `release/kaggle/dataset-metadata.json` validated against G11.1 (title 6-50 chars, subtitle 20-80 chars, slug 3-50 chars, single MIT license, `expectedUpdateFrequency=never`, image filename, `resources[].schema.fields` in column order for every tabular resource). Schema fields cover both flat CSVs (driven by `feature_dictionary.csv`) and parquet files (driven by `pyarrow.parquet.read_schema`). The metadata's `description` field inlines `release/README.md` with three Kaggle-specific rewrites: source-repo tree diagram → upload-tree diagram, `](../foo)` → GitHub blob URL via regex, `](validation/validation_report.md)` → GitHub blob URL. Default `id` follows Kaggle's actual `<owner>/<slug>` schema (`leadforge/leadforge-lead-scoring-v1`), so PR 7.2's publish script does not have to splice in a username at upload time. CLI: `--release-dir`, `--kaggle-dir`, `--tier`, `--user-slug`, `--dataset-slug`, `--cover-image`, `--dry-run`, `--print`. Exit codes: 0 pass / 1 validation failure / 2 pre-flight error.
+- [x] PR 5.1: `scripts/generate_cover_image.py` (new) — deterministic Pillow + DejaVu Sans (bundled with matplotlib) renderer producing `release/dataset-cover-image.png` at 1280×640 (well above the 560×280 minimum, 2:1 aspect for Kaggle's header crop). Three-tier card design surfacing the cross-seed median conversion rate + LR AUC for each tier, pinned from `release/validation/validation_report.md`. Byte-identical re-runs guarded by `tests/scripts/test_generate_cover_image.py`.
+- [x] PR 5.1: Upload-dir assembly under `release/kaggle/` uses relative symlinks for the heavy bundle directories + cover image + LICENSE, plus a real file copy for `README.md` (rewritten on the way in so its `../` links and tree diagram render correctly on the Kaggle dataset page). `_validate_kaggle_dir_safe` refuses to assemble into `cwd` / `release_dir` / its parent / the filesystem anchor. `release/kaggle/*` is gitignored except for `dataset-metadata.json` itself — only the metadata is committed; the upload tree is regenerated on demand.
+- [x] PR 5.1: 19 new tests (`tests/scripts/test_package_kaggle_release.py` × 15, `tests/scripts/test_generate_cover_image.py` × 4): every Kaggle field constraint, schema field order parity for CSV + parquet, README rewriting (tree + `../` + validation report links), unsafe-kaggle-dir rejection, CLI rc=2 on missing release dir, byte-determinism (audit-artifact-sync), and committed-metadata-matches-fresh-regeneration sync check. 1194/1194 tests pass; ruff + mypy clean; `scripts/probe_relational_leakage.py release/{intro,intermediate,advanced} --max-accuracy 0.65` exits 0 on every public tier; `scripts/verify_hash_determinism.py` PASS 67/67; `scripts/validate_release_candidate.py --no-rebuild` exits 0; `BUNDLE_SCHEMA_VERSION` unchanged at 5 (this PR doesn't touch the bundle shape).
+- [ ] PR 5.2: `scripts/package_hf_release.py` → `release/huggingface/README.md` with YAML configs/default/pretty_name/tags
+- [ ] PR 5.2: Local `load_dataset()` smoke test; Kaggle dry-run package validation
 
 ### Phase 6 — Notebook sequence + adversarial framing
 - [ ] `release/notebooks/{02_relational_feature_engineering,03_leakage_and_time_windows,04_lift_calibration_value_ranking}.ipynb`
diff --git a/.gitignore b/.gitignore
index cd71ed6..35abd8e 100644
--- a/.gitignore
+++ b/.gitignore
@@ -218,3 +218,9 @@ release/intermediate_instructor/
 release/LICENSE
 release/_determinism/
 release/_release_quality/
+
+# Generated Kaggle upload tree (PR 5.1) — only dataset-metadata.json is
+# committed; the rest is reassembled on demand via
+# scripts/package_kaggle_release.py from release/{intro,intermediate,advanced}/.
+release/kaggle/*
+!release/kaggle/dataset-metadata.json
diff --git a/release/dataset-cover-image.png b/release/dataset-cover-image.png
new file mode 100644
index 0000000..912bb43
Binary files /dev/null and b/release/dataset-cover-image.png differ
diff --git a/release/kaggle/dataset-metadata.json b/release/kaggle/dataset-metadata.json
new file mode 100644
index 0000000..2f4b9b2
--- /dev/null
+++ b/release/kaggle/dataset-metadata.json
@@ -0,0 +1,2572 @@
+{
+  "collaborators": [],
+  "description": "# LeadForge: Synthetic B2B Lead Scoring Dataset (`leadforge-lead-scoring-v1`)\n\nA relational, reproducible, three-tier synthetic CRM dataset family for\nteaching lead scoring at scale. Generated by\n[leadforge](https://github.com/leadforge-dev/leadforge), an\nopen-source Python framework for synthetic CRM/funnel data. The\nframework version is decoupled from the dataset version: the package\nstays at `1.x`; the dataset is published under the explicit `…-v1`\ntag.\n\n## Why lead scoring matters in 2024–2026\n\nMid-market SaaS vendors entered 2024–2026 with growth slowing and\ncustomer-acquisition costs rising[^macro], so predicting *which* leads\nconvert within a fixed window has moved from a marketing nicety to a\nsurvival skill. This dataset teaches that skill on a relational\nsubstrate, with the realistic confusions (snapshot-window discipline,\nleakage traps, channel signal weaker than vendor blogs imply) that\nstudents will hit when they finally get hands on real CRM data.\n\n[^macro]: Macroeconomic framing summarised in\n[`docs/external_review/summaries/gemini_v2_summary.md`](https://github.com/leadforge-dev/leadforge/blob/main/docs/external_review/summaries/gemini_v2_summary.md)\n(median public-SaaS growth 30%→25% from 2023 to 2025; New CAC Ratio\nrose materially in 2024).\n\n## What's inside\n\n```\n.\n├── intro/ intermediate/ advanced/    # student_public bundles, one per difficulty tier\n│   ├── manifest.json                 # provenance + file hashes\n│   ├── dataset_card.md               # auto-rendered per-bundle card\n│   ├── feature_dictionary.csv        # authoritative column spec\n│   ├── lead_scoring.csv              # flat convenience CSV (all splits)\n│   ├── tables/*.parquet              # 7 snapshot-safe relational tables\n│   └── tasks/converted_within_90_days/{train,valid,test}.parquet\n├── dataset-metadata.json             # Kaggle dataset metadata\n├── dataset-cover-image.png           # Kaggle cover image\n├── README.md                         # Kaggle package README\n└── LICENSE\n```\n\n`student_public` bundles ship the snapshot-safe relational view;\n`research_instructor` companions ship the full-horizon view plus the\nhidden causal structure (DAG, latent registry, mechanism summary)\nunder `metadata/`. The full layout is documented in each bundle's\n`manifest.json`.\n\n## Quick start\n\n```python\n# Flat CSV\ndf = pd.read_csv(\"intermediate/lead_scoring.csv\")\n\n# Parquet task splits (recommended)\ntrain = pd.read_parquet(\"intermediate/tasks/converted_within_90_days/train.parquet\")\ntest  = pd.read_parquet(\"intermediate/tasks/converted_within_90_days/test.parquet\")\n\n# Relational tables (feature engineering — example)\nleads   = pd.read_parquet(\"intermediate/tables/leads.parquet\")\ntouches = pd.read_parquet(\"intermediate/tables/touches.parquet\")\nmy_touch_count = (\n    touches.groupby(\"lead_id\").size().rename(\"my_touch_count\").reset_index()\n)\nfeatures = leads.merge(my_touch_count, on=\"lead_id\", how=\"left\")\n\n# Reproduce from source\n# pip install leadforge\n# leadforge generate --recipe b2b_saas_procurement_v1 --seed 42 \\\n#                    --mode student_public --difficulty intermediate --out my_bundle\n```\n\nThe label `converted_within_90_days` resolves over a 90-day window;\nengagement features (`touch_count`, `session_count`, etc.) are\ncomputed strictly over events on days `[0, 30]`. The deliberate\nexception is `total_touches_all`, the leakage trap — flagged\n`leakage_risk=True` in `feature_dictionary.csv`. Drop it from your\nfeature set unless you're demonstrating leakage detection.\n\n## Dataset summary\n\n| | Intro | Intermediate | Advanced |\n|---|---|---|---|\n| Leads | 5,000 | 5,000 | 5,000 |\n| Accounts | 1,500 | 1,500 | 1,500 |\n| Contacts | 4,200 | 4,200 | 4,200 |\n| Snapshot columns | 32 / 34* | 32 / 34* | 32 / 34* |\n| Target | `converted_within_90_days` | `converted_within_90_days` | `converted_within_90_days` |\n| Conversion rate (recipe band) | 24–61% | 12–31% | 4–12% |\n| Conversion rate (median, seeds 42–46) | 42.67% | 21.60% | 8.40% |\n| Signal strength | 0.90 | 0.70 | 0.50 |\n| Noise scale | 0.10 | 0.30 | 0.55 |\n| Missing rate | 2% | 8% | 18% |\n\n\\* `student_public` / `research_instructor`. Difficulty is modulated\nby the simulation engine — signal strength on latent-trait weights,\nGaussian noise on float features, MCAR missingness, outlier rate —\nnot post-hoc label flipping.\n\n## The scenario\n\n**Veridian Technologies** is a fictional Series B startup (Austin, US)\nselling **Veridian Procure**, a procurement / AP automation SaaS, to\nmid-market firms (200–2,000 employees) in the US and UK. The funnel\nruns through inbound marketing (45%), SDR outbound (35%), and\npartner referrals (20%); four personas drive deals (VP Finance, AP\nManager, IT Director, Procurement Manager). **Task:** predict whether\na lead converts (`closed_won`) within 90 days. ACV bands are\n$18k–$120k. See\n[`docs/release/generation_method.md`](https://github.com/leadforge-dev/leadforge/blob/main/docs/release/generation_method.md)\nfor the full DGP, and the deeper \"what's modelled / approximate / not\nmodelled\" breakdown that this README only summarises.\n\n## Public vs instructor: what's redacted\n\nFiltering happens **during rendering**, not during simulation. The\nredaction contract is single-sourced in\n[`leadforge/validation/leakage_probes.py`](https://github.com/leadforge-dev/leadforge/blob/main/leadforge/validation/leakage_probes.py);\nthe snapshot-safe writer and the validator import the same constants,\nso they cannot drift apart.\n\n| Source-of-truth constant | Public bundle treatment |\n|---|---|\n| `BANNED_LEAD_COLUMNS = (\"converted_within_90_days\", \"conversion_timestamp\")` | Dropped from `tables/leads.parquet` |\n| `BANNED_OPP_COLUMNS = (\"close_outcome\", \"closed_at\")` | Dropped from `tables/opportunities.parquet` |\n| `BANNED_TABLES = (\"customers\", \"subscriptions\")` | Omitted from public bundles |\n| `SNAPSHOT_FILTERED_TABLES` (touches, sessions, sales_activities, opportunities) | Filtered per-lead by `lead_created_at + snapshot_day` |\n| Snapshot redaction (`current_stage`, `is_sql`) | Stripped from `tasks/` splits and `tables/leads.parquet` |\n| `total_touches_all` (deliberate trap) | **Retained in both modes**; flagged `leakage_risk=True` |\n\nEach bundle's `manifest.json` records `relational_snapshot_safe`,\n`redacted_columns`, and `snapshot_day`, so the bundle is\nself-describing.\n\n## Calibration\n\nEvery realism / calibration / difficulty claim in this README is\nbacked by\n[`validation/validation_report.md`](https://github.com/leadforge-dev/leadforge/blob/main/release/validation/validation_report.md),\nregenerated by\n[`scripts/validate_release_candidate.py`](https://github.com/leadforge-dev/leadforge/blob/main/scripts/validate_release_candidate.py)\nwith bands declared in\n[`docs/release/v1_acceptance_gates_bands.yaml`](https://github.com/leadforge-dev/leadforge/blob/main/docs/release/v1_acceptance_gates_bands.yaml).\nHeadline cross-seed medians (seeds 42–46):\n\n| Tier | LR AUC | AP | P@100 | Brier |\n|---|---|---|---|---|\n| intro | 0.879 | 0.761 | 0.80 | 0.130 |\n| intermediate | 0.886 | 0.575 | 0.59 | 0.110 |\n| advanced | 0.886 | 0.351 | 0.34 | 0.061 |\n\nAP, P@100, conversion-rate, and lift orderings hold across the\nintended difficulty axis (intro > intermediate > advanced).\n\n## Intended uses\n\n- Teaching baseline lead-scoring on a flat snapshot.\n- Teaching relational feature engineering against snapshot-safe tables.\n- Teaching leakage detection (the `total_touches_all` trap is\n  designed to be discoverable).\n- Teaching calibration, lift, P@K, value-aware ranking\n  (`expected_acv × P(convert)`), and cohort-shift evaluation.\n- Comparing model families under a controlled DGP.\n\n## Out-of-scope uses\n\n- **Production lead scoring.** The company, product, and customers are\n  fictional.\n- **Vendor benchmarking / paper baselines.** Difficulty tiers are\n  calibrated for pedagogy, not cross-paper comparability.\n- **Causal-inference research that requires recovery of the true DGP.**\n  The instructor companion exposes the hidden graph for teaching, not\n  designed counterfactuals.\n- **Demographic / fairness research.** v1 does not model protected\n  attributes.\n\n## Known limitations\n\n- **Difficulty signal on raw AUC is flat.** LR AUC is ~0.88 across\n  every tier. Difficulty is visible in AP, P@K, Brier, and value\n  capture. Treat AUC as a sanity check, not a difficulty signal.\n- **GBM does not consistently beat LR (gate G7.4.4).** GBM−LR AUC delta\n  is slightly negative in every tier (intro −0.0045, intermediate\n  −0.0072, advanced −0.0133); v1's snapshot is dominated by linear\n  features. v2 will inject non-linear interactions in the simulator.\n- **Channel signal is weak.** Per\n  [`docs/release/channel_signal_audit.md`](https://github.com/leadforge-dev/leadforge/blob/main/docs/release/channel_signal_audit.md),\n  out-of-sample univariate AUC of `lead_source` is ≈0.50–0.52 across\n  all tiers and the per-channel rate spread is ≤0.05. The simulator\n  does not encode channel-conditional probabilities; channel-conditional\n  encoding is post-v1 work.\n- **Cohort-shift degradation is small.** v1 has no time-of-year drift\n  baked in; the cohort-shift gate (G6.4) is informational and will\n  bite in v2.\n\n## Composition\n\n- **Entities.** Accounts, contacts, leads, touches, sessions,\n  sales_activities, opportunities (public); plus customers and\n  subscriptions (instructor only). Per-row counts per bundle live in\n  `manifest.json`.\n- **Features.** 32 public columns grouped by analytical role in\n  [`docs/release/feature_dictionary.md`](https://github.com/leadforge-dev/leadforge/blob/main/docs/release/feature_dictionary.md);\n  the per-bundle `feature_dictionary.csv` is the authoritative\n  machine-readable spec.\n- **Label.** `converted_within_90_days` (boolean), event-derived from\n  the simulator. Never sampled directly.\n- **Splits.** 70/15/15 train/valid/test, deterministic given seed;\n  recorded in `tasks/converted_within_90_days/task_manifest.json`.\n- **Provenance.** Recipe `b2b_saas_procurement_v1`, seed 42, package\n  version stamped in `manifest.json`.\n\n## Maintenance, adversarial framing, license\n\nWe *want* the dataset to be broken. Issue templates ship under\n`.github/ISSUE_TEMPLATE/` (Phase 6); the break-me guide lands as\n`docs/release/break_me_guide.md` (PR 6.3). Once Phase 6 ships,\n`docs/release/v2_decision_log.md` will track every accepted finding\nand the design call that came from it. File issues at\n[leadforge-dev/leadforge](https://github.com/leadforge-dev/leadforge);\nPRs welcome.\n\n| Field | Value |\n|---|---|\n| Generator | leadforge `1.0.0+` |\n| Recipe | `b2b_saas_procurement_v1` |\n| Canonical seed | 42 (cross-seed sweep: 42–46) |\n| Bundle schema version | 5 |\n| Format | Parquet (canonical) + CSV (convenience) |\n| License | MIT — see [LICENSE](LICENSE) |\n\nVerify integrity with `leadforge validate <bundle_dir>`; every file\nis hashed in `manifest.json`.\n",
+  "expectedUpdateFrequency": "never",
+  "id": "leadforge/leadforge-lead-scoring-v1",
+  "image": "dataset-cover-image.png",
+  "isPrivate": true,
+  "keywords": [
+    "b2b",
+    "classification",
+    "crm",
+    "education",
+    "lead-scoring",
+    "saas",
+    "synthetic-data",
+    "tabular"
+  ],
+  "licenses": [
+    {
+      "name": "MIT"
+    }
+  ],
+  "resources": [
+    {
+      "description": "Intro tier flat CSV (all splits concatenated, label retained, snapshot_day=30). The `split` column distinguishes train/valid/test rows.",
+      "path": "intro/lead_scoring.csv",
+      "schema": {
+        "fields": [
+          {
+            "description": "Task-split membership: one of `train`, `valid`, `test`. Matches the per-row split assignment in `tasks/converted_within_90_days/`.",
+            "name": "split",
+            "type": "string"
+          },
+          {
+            "description": "Opaque account identifier.",
+            "name": "account_id",
+            "type": "string"
+          },
+          {
+            "description": "Industry vertical of the buying organization.",
+            "name": "industry",
+            "type": "string"
+          },
+          {
+            "description": "Geographic region of the account's headquarters.",
+            "name": "region",
+            "type": "string"
+          },
+          {
+            "description": "Banded employee headcount of the account.",
+            "name": "employee_band",
+            "type": "string"
+          },
+          {
+            "description": "Banded estimated annual revenue of the account.",
+            "name": "estimated_revenue_band",
+            "type": "string"
+          },
+          {
+            "description": "Banded internal process maturity score (latent).",
+            "name": "process_maturity_band",
+            "type": "string"
+          },
+          {
+            "description": "Opaque contact identifier.",
+            "name": "contact_id",
+            "type": "string"
+          },
+          {
+            "description": "Functional area of the primary contact (e.g. finance, ops).",
+            "name": "role_function",
+            "type": "string"
+          },
+          {
+            "description": "Seniority band of the primary contact.",
+            "name": "seniority",
+            "type": "string"
+          },
+          {
+            "description": "Buyer role classification (economic_buyer, champion, etc.).",
+            "name": "buyer_role",
+            "type": "string"
+          },
+          {
+            "description": "Opaque lead identifier.",
+            "name": "lead_id",
+            "type": "string"
+          },
+          {
+            "description": "ISO-8601 timestamp when the lead was created.",
+            "name": "lead_created_at",
+            "type": "string"
+          },
+          {
+            "description": "Origination source of the lead (e.g. inbound_form, sdr_outbound).",
+            "name": "lead_source",
+            "type": "string"
+          },
+          {
+            "description": "Marketing channel responsible for the first recorded touch.",
+            "name": "first_touch_channel",
+            "type": "string"
+          },
+          {
+            "description": "Total number of marketing/sales touches recorded before snapshot.",
+            "name": "touch_count",
+            "type": "integer"
+          },
+          {
+            "description": "Number of inbound touches before snapshot.",
+            "name": "inbound_touch_count",
+            "type": "integer"
+          },
+          {
+            "description": "Number of outbound touches before snapshot.",
+            "name": "outbound_touch_count",
+            "type": "integer"
+          },
+          {
+            "description": "Number of web/trial sessions recorded before snapshot.",
+            "name": "session_count",
+            "type": "integer"
+          },
+          {
+            "description": "Cumulative pricing page views across all sessions before snapshot.",
+            "name": "pricing_page_views",
+            "type": "integer"
+          },
+          {
+            "description": "Cumulative demo page views across all sessions before snapshot.",
+            "name": "demo_page_views",
+            "type": "integer"
+          },
+          {
+            "description": "Sum of session durations (seconds) before snapshot.",
+            "name": "total_session_duration_seconds",
+            "type": "integer"
+          },
+          {
+            "description": "Number of touches in the first 7 days after lead creation.",
+            "name": "touches_week_1",
+            "type": "integer"
+          },
+          {
+            "description": "Number of touches in the last 7 days before snapshot cutoff.",
+            "name": "touches_last_7_days",
+            "type": "integer"
+          },
+          {
+            "description": "Days between first touch and snapshot cutoff (NaN if no touches).",
+            "name": "days_since_first_touch",
+            "type": "number"
+          },
+          {
+            "description": "Number of sales activities logged before snapshot.",
+            "name": "activity_count",
+            "type": "integer"
+          },
+          {
+            "description": "Days elapsed between most recent touch and snapshot cutoff.",
+            "name": "days_since_last_touch",
+            "type": "number"
+          },
+          {
+            "description": "Whether any opportunity was created by snapshot date (open or closed).",
+            "name": "opportunity_created",
+            "type": "boolean"
+          },
+          {
+            "description": "Whether an open opportunity existed at snapshot date.",
+            "name": "has_open_opportunity",
+            "type": "boolean"
+          },
+          {
+            "description": "Estimated ACV of the most recent open opportunity (NaN if none).",
+            "name": "opportunity_estimated_acv",
+            "type": "number"
+          },
+          {
+            "description": "Expected ACV: opportunity ACV if available by snapshot, else revenue band midpoint heuristic (NaN if neither available).",
+            "name": "expected_acv",
+            "type": "number"
+          },
+          {
+            "description": "Total touches over full 90-day window. LEAKAGE TRAP: uses post-snapshot data. Included for pedagogical purposes only.",
+            "name": "total_touches_all",
+            "type": "integer"
+          },
+          {
+            "description": "Label: True if a closed_won event occurred within 90 days of the snapshot anchor date. Derived from simulated events.",
+            "name": "converted_within_90_days",
+            "type": "boolean"
+          }
+        ]
+      }
+    },
+    {
+      "description": "Intro tier feature dictionary (canonical column spec).",
+      "path": "intro/feature_dictionary.csv"
+    },
+    {
+      "description": "Intro tier train split for `converted_within_90_days` (3,500 rows).",
+      "path": "intro/tasks/converted_within_90_days/train.parquet",
+      "schema": {
+        "fields": [
+          {
+            "name": "account_id",
+            "type": "string"
+          },
+          {
+            "name": "industry",
+            "type": "string"
+          },
+          {
+            "name": "region",
+            "type": "string"
+          },
+          {
+            "name": "employee_band",
+            "type": "string"
+          },
+          {
+            "name": "estimated_revenue_band",
+            "type": "string"
+          },
+          {
+            "name": "process_maturity_band",
+            "type": "string"
+          },
+          {
+            "name": "contact_id",
+            "type": "string"
+          },
+          {
+            "name": "role_function",
+            "type": "string"
+          },
+          {
+            "name": "seniority",
+            "type": "string"
+          },
+          {
+            "name": "buyer_role",
+            "type": "string"
+          },
+          {
+            "name": "lead_id",
+            "type": "string"
+          },
+          {
+            "name": "lead_created_at",
+            "type": "string"
+          },
+          {
+            "name": "lead_source",
+            "type": "string"
+          },
+          {
+            "name": "first_touch_channel",
+            "type": "string"
+          },
+          {
+            "name": "touch_count",
+            "type": "number"
+          },
+          {
+            "name": "inbound_touch_count",
+            "type": "number"
+          },
+          {
+            "name": "outbound_touch_count",
+            "type": "number"
+          },
+          {
+            "name": "session_count",
+            "type": "number"
+          },
+          {
+            "name": "pricing_page_views",
+            "type": "number"
+          },
+          {
+            "name": "demo_page_views",
+            "type": "number"
+          },
+          {
+            "name": "total_session_duration_seconds",
+            "type": "number"
+          },
+          {
+            "name": "touches_week_1",
+            "type": "number"
+          },
+          {
+            "name": "touches_last_7_days",
+            "type": "number"
+          },
+          {
+            "name": "days_since_first_touch",
+            "type": "number"
+          },
+          {
+            "name": "activity_count",
+            "type": "number"
+          },
+          {
+            "name": "days_since_last_touch",
+            "type": "number"
+          },
+          {
+            "name": "opportunity_created",
+            "type": "boolean"
+          },
+          {
+            "name": "has_open_opportunity",
+            "type": "boolean"
+          },
+          {
+            "name": "opportunity_estimated_acv",
+            "type": "number"
+          },
+          {
+            "name": "expected_acv",
+            "type": "number"
+          },
+          {
+            "name": "total_touches_all",
+            "type": "number"
+          },
+          {
+            "name": "converted_within_90_days",
+            "type": "boolean"
+          }
+        ]
+      }
+    },
+    {
+      "description": "Intro tier valid split for `converted_within_90_days` (750 rows).",
+      "path": "intro/tasks/converted_within_90_days/valid.parquet",
+      "schema": {
+        "fields": [
+          {
+            "name": "account_id",
+            "type": "string"
+          },
+          {
+            "name": "industry",
+            "type": "string"
+          },
+          {
+            "name": "region",
+            "type": "string"
+          },
+          {
+            "name": "employee_band",
+            "type": "string"
+          },
+          {
+            "name": "estimated_revenue_band",
+            "type": "string"
+          },
+          {
+            "name": "process_maturity_band",
+            "type": "string"
+          },
+          {
+            "name": "contact_id",
+            "type": "string"
+          },
+          {
+            "name": "role_function",
+            "type": "string"
+          },
+          {
+            "name": "seniority",
+            "type": "string"
+          },
+          {
+            "name": "buyer_role",
+            "type": "string"
+          },
+          {
+            "name": "lead_id",
+            "type": "string"
+          },
+          {
+            "name": "lead_created_at",
+            "type": "string"
+          },
+          {
+            "name": "lead_source",
+            "type": "string"
+          },
+          {
+            "name": "first_touch_channel",
+            "type": "string"
+          },
+          {
+            "name": "touch_count",
+            "type": "number"
+          },
+          {
+            "name": "inbound_touch_count",
+            "type": "number"
+          },
+          {
+            "name": "outbound_touch_count",
+            "type": "number"
+          },
+          {
+            "name": "session_count",
+            "type": "number"
+          },
+          {
+            "name": "pricing_page_views",
+            "type": "number"
+          },
+          {
+            "name": "demo_page_views",
+            "type": "number"
+          },
+          {
+            "name": "total_session_duration_seconds",
+            "type": "number"
+          },
+          {
+            "name": "touches_week_1",
+            "type": "number"
+          },
+          {
+            "name": "touches_last_7_days",
+            "type": "number"
+          },
+          {
+            "name": "days_since_first_touch",
+            "type": "number"
+          },
+          {
+            "name": "activity_count",
+            "type": "number"
+          },
+          {
+            "name": "days_since_last_touch",
+            "type": "number"
+          },
+          {
+            "name": "opportunity_created",
+            "type": "boolean"
+          },
+          {
+            "name": "has_open_opportunity",
+            "type": "boolean"
+          },
+          {
+            "name": "opportunity_estimated_acv",
+            "type": "number"
+          },
+          {
+            "name": "expected_acv",
+            "type": "number"
+          },
+          {
+            "name": "total_touches_all",
+            "type": "number"
+          },
+          {
+            "name": "converted_within_90_days",
+            "type": "boolean"
+          }
+        ]
+      }
+    },
+    {
+      "description": "Intro tier test split for `converted_within_90_days` (750 rows).",
+      "path": "intro/tasks/converted_within_90_days/test.parquet",
+      "schema": {
+        "fields": [
+          {
+            "name": "account_id",
+            "type": "string"
+          },
+          {
+            "name": "industry",
+            "type": "string"
+          },
+          {
+            "name": "region",
+            "type": "string"
+          },
+          {
+            "name": "employee_band",
+            "type": "string"
+          },
+          {
+            "name": "estimated_revenue_band",
+            "type": "string"
+          },
+          {
+            "name": "process_maturity_band",
+            "type": "string"
+          },
+          {
+            "name": "contact_id",
+            "type": "string"
+          },
+          {
+            "name": "role_function",
+            "type": "string"
+          },
+          {
+            "name": "seniority",
+            "type": "string"
+          },
+          {
+            "name": "buyer_role",
+            "type": "string"
+          },
+          {
+            "name": "lead_id",
+            "type": "string"
+          },
+          {
+            "name": "lead_created_at",
+            "type": "string"
+          },
+          {
+            "name": "lead_source",
+            "type": "string"
+          },
+          {
+            "name": "first_touch_channel",
+            "type": "string"
+          },
+          {
+            "name": "touch_count",
+            "type": "number"
+          },
+          {
+            "name": "inbound_touch_count",
+            "type": "number"
+          },
+          {
+            "name": "outbound_touch_count",
+            "type": "number"
+          },
+          {
+            "name": "session_count",
+            "type": "number"
+          },
+          {
+            "name": "pricing_page_views",
+            "type": "number"
+          },
+          {
+            "name": "demo_page_views",
+            "type": "number"
+          },
+          {
+            "name": "total_session_duration_seconds",
+            "type": "number"
+          },
+          {
+            "name": "touches_week_1",
+            "type": "number"
+          },
+          {
+            "name": "touches_last_7_days",
+            "type": "number"
+          },
+          {
+            "name": "days_since_first_touch",
+            "type": "number"
+          },
+          {
+            "name": "activity_count",
+            "type": "number"
+          },
+          {
+            "name": "days_since_last_touch",
+            "type": "number"
+          },
+          {
+            "name": "opportunity_created",
+            "type": "boolean"
+          },
+          {
+            "name": "has_open_opportunity",
+            "type": "boolean"
+          },
+          {
+            "name": "opportunity_estimated_acv",
+            "type": "number"
+          },
+          {
+            "name": "expected_acv",
+            "type": "number"
+          },
+          {
+            "name": "total_touches_all",
+            "type": "number"
+          },
+          {
+            "name": "converted_within_90_days",
+            "type": "boolean"
+          }
+        ]
+      }
+    },
+    {
+      "description": "Intro tier `accounts` relational table (1,500 rows) — snapshot-safe.",
+      "path": "intro/tables/accounts.parquet",
+      "schema": {
+        "fields": [
+          {
+            "name": "account_id",
+            "type": "string"
+          },
+          {
+            "name": "company_name",
+            "type": "string"
+          },
+          {
+            "name": "industry",
+            "type": "string"
+          },
+          {
+            "name": "region",
+            "type": "string"
+          },
+          {
+            "name": "employee_band",
+            "type": "string"
+          },
+          {
+            "name": "estimated_revenue_band",
+            "type": "string"
+          },
+          {
+            "name": "process_maturity_band",
+            "type": "string"
+          },
+          {
+            "name": "created_at",
+            "type": "string"
+          }
+        ]
+      }
+    },
+    {
+      "description": "Intro tier `contacts` relational table (4,200 rows) — snapshot-safe.",
+      "path": "intro/tables/contacts.parquet",
+      "schema": {
+        "fields": [
+          {
+            "name": "contact_id",
+            "type": "string"
+          },
+          {
+            "name": "account_id",
+            "type": "string"
+          },
+          {
+            "name": "job_title",
+            "type": "string"
+          },
+          {
+            "name": "role_function",
+            "type": "string"
+          },
+          {
+            "name": "seniority",
+            "type": "string"
+          },
+          {
+            "name": "buyer_role",
+            "type": "string"
+          },
+          {
+            "name": "email_domain_type",
+            "type": "string"
+          },
+          {
+            "name": "created_at",
+            "type": "string"
+          }
+        ]
+      }
+    },
+    {
+      "description": "Intro tier `leads` relational table (5,000 rows) — snapshot-safe.",
+      "path": "intro/tables/leads.parquet",
+      "schema": {
+        "fields": [
+          {
+            "name": "lead_id",
+            "type": "string"
+          },
+          {
+            "name": "contact_id",
+            "type": "string"
+          },
+          {
+            "name": "account_id",
+            "type": "string"
+          },
+          {
+            "name": "lead_created_at",
+            "type": "string"
+          },
+          {
+            "name": "lead_source",
+            "type": "string"
+          },
+          {
+            "name": "first_touch_channel",
+            "type": "string"
+          },
+          {
+            "name": "owner_rep_id",
+            "type": "string"
+          }
+        ]
+      }
+    },
+    {
+      "description": "Intro tier `touches` relational table (38,561 rows) — snapshot-safe.",
+      "path": "intro/tables/touches.parquet",
+      "schema": {
+        "fields": [
+          {
+            "name": "touch_id",
+            "type": "string"
+          },
+          {
+            "name": "lead_id",
+            "type": "string"
+          },
+          {
+            "name": "touch_timestamp",
+            "type": "string"
+          },
+          {
+            "name": "touch_type",
+            "type": "string"
+          },
+          {
+            "name": "touch_channel",
+            "type": "string"
+          },
+          {
+            "name": "touch_direction",
+            "type": "string"
+          },
+          {
+            "name": "campaign_id",
+            "type": "string"
+          }
+        ]
+      }
+    },
+    {
+      "description": "Intro tier `sessions` relational table (10,171 rows) — snapshot-safe.",
+      "path": "intro/tables/sessions.parquet",
+      "schema": {
+        "fields": [
+          {
+            "name": "session_id",
+            "type": "string"
+          },
+          {
+            "name": "lead_id",
+            "type": "string"
+          },
+          {
+            "name": "session_timestamp",
+            "type": "string"
+          },
+          {
+            "name": "session_type",
+            "type": "string"
+          },
+          {
+            "name": "page_views",
+            "type": "integer"
+          },
+          {
+            "name": "pricing_page_views",
+            "type": "integer"
+          },
+          {
+            "name": "demo_page_views",
+            "type": "integer"
+          },
+          {
+            "name": "session_duration_seconds",
+            "type": "integer"
+          }
+        ]
+      }
+    },
+    {
+      "description": "Intro tier `sales_activities` relational table (21,358 rows) — snapshot-safe.",
+      "path": "intro/tables/sales_activities.parquet",
+      "schema": {
+        "fields": [
+          {
+            "name": "activity_id",
+            "type": "string"
+          },
+          {
+            "name": "lead_id",
+            "type": "string"
+          },
+          {
+            "name": "rep_id",
+            "type": "string"
+          },
+          {
+            "name": "activity_timestamp",
+            "type": "string"
+          },
+          {
+            "name": "activity_type",
+            "type": "string"
+          },
+          {
+            "name": "activity_outcome",
+            "type": "string"
+          }
+        ]
+      }
+    },
+    {
+      "description": "Intro tier `opportunities` relational table (4,426 rows) — snapshot-safe.",
+      "path": "intro/tables/opportunities.parquet",
+      "schema": {
+        "fields": [
+          {
+            "name": "opportunity_id",
+            "type": "string"
+          },
+          {
+            "name": "lead_id",
+            "type": "string"
+          },
+          {
+            "name": "created_at",
+            "type": "string"
+          },
+          {
+            "name": "stage",
+            "type": "string"
+          },
+          {
+            "name": "estimated_acv",
+            "type": "integer"
+          }
+        ]
+      }
+    },
+    {
+      "description": "Intro tier auto-rendered dataset card.",
+      "path": "intro/dataset_card.md"
+    },
+    {
+      "description": "Intro tier provenance manifest (recipe, seed, package version, file hashes, snapshot_day, redaction contract).",
+      "path": "intro/manifest.json"
+    },
+    {
+      "description": "Intermediate tier flat CSV (all splits concatenated, label retained, snapshot_day=30). The `split` column distinguishes train/valid/test rows.",
+      "path": "intermediate/lead_scoring.csv",
+      "schema": {
+        "fields": [
+          {
+            "description": "Task-split membership: one of `train`, `valid`, `test`. Matches the per-row split assignment in `tasks/converted_within_90_days/`.",
+            "name": "split",
+            "type": "string"
+          },
+          {
+            "description": "Opaque account identifier.",
+            "name": "account_id",
+            "type": "string"
+          },
+          {
+            "description": "Industry vertical of the buying organization.",
+            "name": "industry",
+            "type": "string"
+          },
+          {
+            "description": "Geographic region of the account's headquarters.",
+            "name": "region",
+            "type": "string"
+          },
+          {
+            "description": "Banded employee headcount of the account.",
+            "name": "employee_band",
+            "type": "string"
+          },
+          {
+            "description": "Banded estimated annual revenue of the account.",
+            "name": "estimated_revenue_band",
+            "type": "string"
+          },
+          {
+            "description": "Banded internal process maturity score (latent).",
+            "name": "process_maturity_band",
+            "type": "string"
+          },
+          {
+            "description": "Opaque contact identifier.",
+            "name": "contact_id",
+            "type": "string"
+          },
+          {
+            "description": "Functional area of the primary contact (e.g. finance, ops).",
+            "name": "role_function",
+            "type": "string"
+          },
+          {
+            "description": "Seniority band of the primary contact.",
+            "name": "seniority",
+            "type": "string"
+          },
+          {
+            "description": "Buyer role classification (economic_buyer, champion, etc.).",
+            "name": "buyer_role",
+            "type": "string"
+          },
+          {
+            "description": "Opaque lead identifier.",
+            "name": "lead_id",
+            "type": "string"
+          },
+          {
+            "description": "ISO-8601 timestamp when the lead was created.",
+            "name": "lead_created_at",
+            "type": "string"
+          },
+          {
+            "description": "Origination source of the lead (e.g. inbound_form, sdr_outbound).",
+            "name": "lead_source",
+            "type": "string"
+          },
+          {
+            "description": "Marketing channel responsible for the first recorded touch.",
+            "name": "first_touch_channel",
+            "type": "string"
+          },
+          {
+            "description": "Total number of marketing/sales touches recorded before snapshot.",
+            "name": "touch_count",
+            "type": "integer"
+          },
+          {
+            "description": "Number of inbound touches before snapshot.",
+            "name": "inbound_touch_count",
+            "type": "integer"
+          },
+          {
+            "description": "Number of outbound touches before snapshot.",
+            "name": "outbound_touch_count",
+            "type": "integer"
+          },
+          {
+            "description": "Number of web/trial sessions recorded before snapshot.",
+            "name": "session_count",
+            "type": "integer"
+          },
+          {
+            "description": "Cumulative pricing page views across all sessions before snapshot.",
+            "name": "pricing_page_views",
+            "type": "integer"
+          },
+          {
+            "description": "Cumulative demo page views across all sessions before snapshot.",
+            "name": "demo_page_views",
+            "type": "integer"
+          },
+          {
+            "description": "Sum of session durations (seconds) before snapshot.",
+            "name": "total_session_duration_seconds",
+            "type": "integer"
+          },
+          {
+            "description": "Number of touches in the first 7 days after lead creation.",
+            "name": "touches_week_1",
+            "type": "integer"
+          },
+          {
+            "description": "Number of touches in the last 7 days before snapshot cutoff.",
+            "name": "touches_last_7_days",
+            "type": "integer"
+          },
+          {
+            "description": "Days between first touch and snapshot cutoff (NaN if no touches).",
+            "name": "days_since_first_touch",
+            "type": "number"
+          },
+          {
+            "description": "Number of sales activities logged before snapshot.",
+            "name": "activity_count",
+            "type": "integer"
+          },
+          {
+            "description": "Days elapsed between most recent touch and snapshot cutoff.",
+            "name": "days_since_last_touch",
+            "type": "number"
+          },
+          {
+            "description": "Whether any opportunity was created by snapshot date (open or closed).",
+            "name": "opportunity_created",
+            "type": "boolean"
+          },
+          {
+            "description": "Whether an open opportunity existed at snapshot date.",
+            "name": "has_open_opportunity",
+            "type": "boolean"
+          },
+          {
+            "description": "Estimated ACV of the most recent open opportunity (NaN if none).",
+            "name": "opportunity_estimated_acv",
+            "type": "number"
+          },
+          {
+            "description": "Expected ACV: opportunity ACV if available by snapshot, else revenue band midpoint heuristic (NaN if neither available).",
+            "name": "expected_acv",
+            "type": "number"
+          },
+          {
+            "description": "Total touches over full 90-day window. LEAKAGE TRAP: uses post-snapshot data. Included for pedagogical purposes only.",
+            "name": "total_touches_all",
+            "type": "integer"
+          },
+          {
+            "description": "Label: True if a closed_won event occurred within 90 days of the snapshot anchor date. Derived from simulated events.",
+            "name": "converted_within_90_days",
+            "type": "boolean"
+          }
+        ]
+      }
+    },
+    {
+      "description": "Intermediate tier feature dictionary (canonical column spec).",
+      "path": "intermediate/feature_dictionary.csv"
+    },
+    {
+      "description": "Intermediate tier train split for `converted_within_90_days` (3,500 rows).",
+      "path": "intermediate/tasks/converted_within_90_days/train.parquet",
+      "schema": {
+        "fields": [
+          {
+            "name": "account_id",
+            "type": "string"
+          },
+          {
+            "name": "industry",
+            "type": "string"
+          },
+          {
+            "name": "region",
+            "type": "string"
+          },
+          {
+            "name": "employee_band",
+            "type": "string"
+          },
+          {
+            "name": "estimated_revenue_band",
+            "type": "string"
+          },
+          {
+            "name": "process_maturity_band",
+            "type": "string"
+          },
+          {
+            "name": "contact_id",
+            "type": "string"
+          },
+          {
+            "name": "role_function",
+            "type": "string"
+          },
+          {
+            "name": "seniority",
+            "type": "string"
+          },
+          {
+            "name": "buyer_role",
+            "type": "string"
+          },
+          {
+            "name": "lead_id",
+            "type": "string"
+          },
+          {
+            "name": "lead_created_at",
+            "type": "string"
+          },
+          {
+            "name": "lead_source",
+            "type": "string"
+          },
+          {
+            "name": "first_touch_channel",
+            "type": "string"
+          },
+          {
+            "name": "touch_count",
+            "type": "number"
+          },
+          {
+            "name": "inbound_touch_count",
+            "type": "number"
+          },
+          {
+            "name": "outbound_touch_count",
+            "type": "number"
+          },
+          {
+            "name": "session_count",
+            "type": "number"
+          },
+          {
+            "name": "pricing_page_views",
+            "type": "number"
+          },
+          {
+            "name": "demo_page_views",
+            "type": "number"
+          },
+          {
+            "name": "total_session_duration_seconds",
+            "type": "number"
+          },
+          {
+            "name": "touches_week_1",
+            "type": "number"
+          },
+          {
+            "name": "touches_last_7_days",
+            "type": "number"
+          },
+          {
+            "name": "days_since_first_touch",
+            "type": "number"
+          },
+          {
+            "name": "activity_count",
+            "type": "number"
+          },
+          {
+            "name": "days_since_last_touch",
+            "type": "number"
+          },
+          {
+            "name": "opportunity_created",
+            "type": "boolean"
+          },
+          {
+            "name": "has_open_opportunity",
+            "type": "boolean"
+          },
+          {
+            "name": "opportunity_estimated_acv",
+            "type": "number"
+          },
+          {
+            "name": "expected_acv",
+            "type": "number"
+          },
+          {
+            "name": "total_touches_all",
+            "type": "number"
+          },
+          {
+            "name": "converted_within_90_days",
+            "type": "boolean"
+          }
+        ]
+      }
+    },
+    {
+      "description": "Intermediate tier valid split for `converted_within_90_days` (750 rows).",
+      "path": "intermediate/tasks/converted_within_90_days/valid.parquet",
+      "schema": {
+        "fields": [
+          {
+            "name": "account_id",
+            "type": "string"
+          },
+          {
+            "name": "industry",
+            "type": "string"
+          },
+          {
+            "name": "region",
+            "type": "string"
+          },
+          {
+            "name": "employee_band",
+            "type": "string"
+          },
+          {
+            "name": "estimated_revenue_band",
+            "type": "string"
+          },
+          {
+            "name": "process_maturity_band",
+            "type": "string"
+          },
+          {
+            "name": "contact_id",
+            "type": "string"
+          },
+          {
+            "name": "role_function",
+            "type": "string"
+          },
+          {
+            "name": "seniority",
+            "type": "string"
+          },
+          {
+            "name": "buyer_role",
+            "type": "string"
+          },
+          {
+            "name": "lead_id",
+            "type": "string"
+          },
+          {
+            "name": "lead_created_at",
+            "type": "string"
+          },
+          {
+            "name": "lead_source",
+            "type": "string"
+          },
+          {
+            "name": "first_touch_channel",
+            "type": "string"
+          },
+          {
+            "name": "touch_count",
+            "type": "number"
+          },
+          {
+            "name": "inbound_touch_count",
+            "type": "number"
+          },
+          {
+            "name": "outbound_touch_count",
+            "type": "number"
+          },
+          {
+            "name": "session_count",
+            "type": "number"
+          },
+          {
+            "name": "pricing_page_views",
+            "type": "number"
+          },
+          {
+            "name": "demo_page_views",
+            "type": "number"
+          },
+          {
+            "name": "total_session_duration_seconds",
+            "type": "number"
+          },
+          {
+            "name": "touches_week_1",
+            "type": "number"
+          },
+          {
+            "name": "touches_last_7_days",
+            "type": "number"
+          },
+          {
+            "name": "days_since_first_touch",
+            "type": "number"
+          },
+          {
+            "name": "activity_count",
+            "type": "number"
+          },
+          {
+            "name": "days_since_last_touch",
+            "type": "number"
+          },
+          {
+            "name": "opportunity_created",
+            "type": "boolean"
+          },
+          {
+            "name": "has_open_opportunity",
+            "type": "boolean"
+          },
+          {
+            "name": "opportunity_estimated_acv",
+            "type": "number"
+          },
+          {
+            "name": "expected_acv",
+            "type": "number"
+          },
+          {
+            "name": "total_touches_all",
+            "type": "number"
+          },
+          {
+            "name": "converted_within_90_days",
+            "type": "boolean"
+          }
+        ]
+      }
+    },
+    {
+      "description": "Intermediate tier test split for `converted_within_90_days` (750 rows).",
+      "path": "intermediate/tasks/converted_within_90_days/test.parquet",
+      "schema": {
+        "fields": [
+          {
+            "name": "account_id",
+            "type": "string"
+          },
+          {
+            "name": "industry",
+            "type": "string"
+          },
+          {
+            "name": "region",
+            "type": "string"
+          },
+          {
+            "name": "employee_band",
+            "type": "string"
+          },
+          {
+            "name": "estimated_revenue_band",
+            "type": "string"
+          },
+          {
+            "name": "process_maturity_band",
+            "type": "string"
+          },
+          {
+            "name": "contact_id",
+            "type": "string"
+          },
+          {
+            "name": "role_function",
+            "type": "string"
+          },
+          {
+            "name": "seniority",
+            "type": "string"
+          },
+          {
+            "name": "buyer_role",
+            "type": "string"
+          },
+          {
+            "name": "lead_id",
+            "type": "string"
+          },
+          {
+            "name": "lead_created_at",
+            "type": "string"
+          },
+          {
+            "name": "lead_source",
+            "type": "string"
+          },
+          {
+            "name": "first_touch_channel",
+            "type": "string"
+          },
+          {
+            "name": "touch_count",
+            "type": "number"
+          },
+          {
+            "name": "inbound_touch_count",
+            "type": "number"
+          },
+          {
+            "name": "outbound_touch_count",
+            "type": "number"
+          },
+          {
+            "name": "session_count",
+            "type": "number"
+          },
+          {
+            "name": "pricing_page_views",
+            "type": "number"
+          },
+          {
+            "name": "demo_page_views",
+            "type": "number"
+          },
+          {
+            "name": "total_session_duration_seconds",
+            "type": "number"
+          },
+          {
+            "name": "touches_week_1",
+            "type": "number"
+          },
+          {
+            "name": "touches_last_7_days",
+            "type": "number"
+          },
+          {
+            "name": "days_since_first_touch",
+            "type": "number"
+          },
+          {
+            "name": "activity_count",
+            "type": "number"
+          },
+          {
+            "name": "days_since_last_touch",
+            "type": "number"
+          },
+          {
+            "name": "opportunity_created",
+            "type": "boolean"
+          },
+          {
+            "name": "has_open_opportunity",
+            "type": "boolean"
+          },
+          {
+            "name": "opportunity_estimated_acv",
+            "type": "number"
+          },
+          {
+            "name": "expected_acv",
+            "type": "number"
+          },
+          {
+            "name": "total_touches_all",
+            "type": "number"
+          },
+          {
+            "name": "converted_within_90_days",
+            "type": "boolean"
+          }
+        ]
+      }
+    },
+    {
+      "description": "Intermediate tier `accounts` relational table (1,500 rows) — snapshot-safe.",
+      "path": "intermediate/tables/accounts.parquet",
+      "schema": {
+        "fields": [
+          {
+            "name": "account_id",
+            "type": "string"
+          },
+          {
+            "name": "company_name",
+            "type": "string"
+          },
+          {
+            "name": "industry",
+            "type": "string"
+          },
+          {
+            "name": "region",
+            "type": "string"
+          },
+          {
+            "name": "employee_band",
+            "type": "string"
+          },
+          {
+            "name": "estimated_revenue_band",
+            "type": "string"
+          },
+          {
+            "name": "process_maturity_band",
+            "type": "string"
+          },
+          {
+            "name": "created_at",
+            "type": "string"
+          }
+        ]
+      }
+    },
+    {
+      "description": "Intermediate tier `contacts` relational table (4,200 rows) — snapshot-safe.",
+      "path": "intermediate/tables/contacts.parquet",
+      "schema": {
+        "fields": [
+          {
+            "name": "contact_id",
+            "type": "string"
+          },
+          {
+            "name": "account_id",
+            "type": "string"
+          },
+          {
+            "name": "job_title",
+            "type": "string"
+          },
+          {
+            "name": "role_function",
+            "type": "string"
+          },
+          {
+            "name": "seniority",
+            "type": "string"
+          },
+          {
+            "name": "buyer_role",
+            "type": "string"
+          },
+          {
+            "name": "email_domain_type",
+            "type": "string"
+          },
+          {
+            "name": "created_at",
+            "type": "string"
+          }
+        ]
+      }
+    },
+    {
+      "description": "Intermediate tier `leads` relational table (5,000 rows) — snapshot-safe.",
+      "path": "intermediate/tables/leads.parquet",
+      "schema": {
+        "fields": [
+          {
+            "name": "lead_id",
+            "type": "string"
+          },
+          {
+            "name": "contact_id",
+            "type": "string"
+          },
+          {
+            "name": "account_id",
+            "type": "string"
+          },
+          {
+            "name": "lead_created_at",
+            "type": "string"
+          },
+          {
+            "name": "lead_source",
+            "type": "string"
+          },
+          {
+            "name": "first_touch_channel",
+            "type": "string"
+          },
+          {
+            "name": "owner_rep_id",
+            "type": "string"
+          }
+        ]
+      }
+    },
+    {
+      "description": "Intermediate tier `touches` relational table (38,724 rows) — snapshot-safe.",
+      "path": "intermediate/tables/touches.parquet",
+      "schema": {
+        "fields": [
+          {
+            "name": "touch_id",
+            "type": "string"
+          },
+          {
+            "name": "lead_id",
+            "type": "string"
+          },
+          {
+            "name": "touch_timestamp",
+            "type": "string"
+          },
+          {
+            "name": "touch_type",
+            "type": "string"
+          },
+          {
+            "name": "touch_channel",
+            "type": "string"
+          },
+          {
+            "name": "touch_direction",
+            "type": "string"
+          },
+          {
+            "name": "campaign_id",
+            "type": "string"
+          }
+        ]
+      }
+    },
+    {
+      "description": "Intermediate tier `sessions` relational table (10,012 rows) — snapshot-safe.",
+      "path": "intermediate/tables/sessions.parquet",
+      "schema": {
+        "fields": [
+          {
+            "name": "session_id",
+            "type": "string"
+          },
+          {
+            "name": "lead_id",
+            "type": "string"
+          },
+          {
+            "name": "session_timestamp",
+            "type": "string"
+          },
+          {
+            "name": "session_type",
+            "type": "string"
+          },
+          {
+            "name": "page_views",
+            "type": "integer"
+          },
+          {
+            "name": "pricing_page_views",
+            "type": "integer"
+          },
+          {
+            "name": "demo_page_views",
+            "type": "integer"
+          },
+          {
+            "name": "session_duration_seconds",
+            "type": "integer"
+          }
+        ]
+      }
+    },
+    {
+      "description": "Intermediate tier `sales_activities` relational table (20,679 rows) — snapshot-safe.",
+      "path": "intermediate/tables/sales_activities.parquet",
+      "schema": {
+        "fields": [
+          {
+            "name": "activity_id",
+            "type": "string"
+          },
+          {
+            "name": "lead_id",
+            "type": "string"
+          },
+          {
+            "name": "rep_id",
+            "type": "string"
+          },
+          {
+            "name": "activity_timestamp",
+            "type": "string"
+          },
+          {
+            "name": "activity_type",
+            "type": "string"
+          },
+          {
+            "name": "activity_outcome",
+            "type": "string"
+          }
+        ]
+      }
+    },
+    {
+      "description": "Intermediate tier `opportunities` relational table (4,255 rows) — snapshot-safe.",
+      "path": "intermediate/tables/opportunities.parquet",
+      "schema": {
+        "fields": [
+          {
+            "name": "opportunity_id",
+            "type": "string"
+          },
+          {
+            "name": "lead_id",
+            "type": "string"
+          },
+          {
+            "name": "created_at",
+            "type": "string"
+          },
+          {
+            "name": "stage",
+            "type": "string"
+          },
+          {
+            "name": "estimated_acv",
+            "type": "integer"
+          }
+        ]
+      }
+    },
+    {
+      "description": "Intermediate tier auto-rendered dataset card.",
+      "path": "intermediate/dataset_card.md"
+    },
+    {
+      "description": "Intermediate tier provenance manifest (recipe, seed, package version, file hashes, snapshot_day, redaction contract).",
+      "path": "intermediate/manifest.json"
+    },
+    {
+      "description": "Advanced tier flat CSV (all splits concatenated, label retained, snapshot_day=30). The `split` column distinguishes train/valid/test rows.",
+      "path": "advanced/lead_scoring.csv",
+      "schema": {
+        "fields": [
+          {
+            "description": "Task-split membership: one of `train`, `valid`, `test`. Matches the per-row split assignment in `tasks/converted_within_90_days/`.",
+            "name": "split",
+            "type": "string"
+          },
+          {
+            "description": "Opaque account identifier.",
+            "name": "account_id",
+            "type": "string"
+          },
+          {
+            "description": "Industry vertical of the buying organization.",
+            "name": "industry",
+            "type": "string"
+          },
+          {
+            "description": "Geographic region of the account's headquarters.",
+            "name": "region",
+            "type": "string"
+          },
+          {
+            "description": "Banded employee headcount of the account.",
+            "name": "employee_band",
+            "type": "string"
+          },
+          {
+            "description": "Banded estimated annual revenue of the account.",
+            "name": "estimated_revenue_band",
+            "type": "string"
+          },
+          {
+            "description": "Banded internal process maturity score (latent).",
+            "name": "process_maturity_band",
+            "type": "string"
+          },
+          {
+            "description": "Opaque contact identifier.",
+            "name": "contact_id",
+            "type": "string"
+          },
+          {
+            "description": "Functional area of the primary contact (e.g. finance, ops).",
+            "name": "role_function",
+            "type": "string"
+          },
+          {
+            "description": "Seniority band of the primary contact.",
+            "name": "seniority",
+            "type": "string"
+          },
+          {
+            "description": "Buyer role classification (economic_buyer, champion, etc.).",
+            "name": "buyer_role",
+            "type": "string"
+          },
+          {
+            "description": "Opaque lead identifier.",
+            "name": "lead_id",
+            "type": "string"
+          },
+          {
+            "description": "ISO-8601 timestamp when the lead was created.",
+            "name": "lead_created_at",
+            "type": "string"
+          },
+          {
+            "description": "Origination source of the lead (e.g. inbound_form, sdr_outbound).",
+            "name": "lead_source",
+            "type": "string"
+          },
+          {
+            "description": "Marketing channel responsible for the first recorded touch.",
+            "name": "first_touch_channel",
+            "type": "string"
+          },
+          {
+            "description": "Total number of marketing/sales touches recorded before snapshot.",
+            "name": "touch_count",
+            "type": "integer"
+          },
+          {
+            "description": "Number of inbound touches before snapshot.",
+            "name": "inbound_touch_count",
+            "type": "integer"
+          },
+          {
+            "description": "Number of outbound touches before snapshot.",
+            "name": "outbound_touch_count",
+            "type": "integer"
+          },
+          {
+            "description": "Number of web/trial sessions recorded before snapshot.",
+            "name": "session_count",
+            "type": "integer"
+          },
+          {
+            "description": "Cumulative pricing page views across all sessions before snapshot.",
+            "name": "pricing_page_views",
+            "type": "integer"
+          },
+          {
+            "description": "Cumulative demo page views across all sessions before snapshot.",
+            "name": "demo_page_views",
+            "type": "integer"
+          },
+          {
+            "description": "Sum of session durations (seconds) before snapshot.",
+            "name": "total_session_duration_seconds",
+            "type": "integer"
+          },
+          {
+            "description": "Number of touches in the first 7 days after lead creation.",
+            "name": "touches_week_1",
+            "type": "integer"
+          },
+          {
+            "description": "Number of touches in the last 7 days before snapshot cutoff.",
+            "name": "touches_last_7_days",
+            "type": "integer"
+          },
+          {
+            "description": "Days between first touch and snapshot cutoff (NaN if no touches).",
+            "name": "days_since_first_touch",
+            "type": "number"
+          },
+          {
+            "description": "Number of sales activities logged before snapshot.",
+            "name": "activity_count",
+            "type": "integer"
+          },
+          {
+            "description": "Days elapsed between most recent touch and snapshot cutoff.",
+            "name": "days_since_last_touch",
+            "type": "number"
+          },
+          {
+            "description": "Whether any opportunity was created by snapshot date (open or closed).",
+            "name": "opportunity_created",
+            "type": "boolean"
+          },
+          {
+            "description": "Whether an open opportunity existed at snapshot date.",
+            "name": "has_open_opportunity",
+            "type": "boolean"
+          },
+          {
+            "description": "Estimated ACV of the most recent open opportunity (NaN if none).",
+            "name": "opportunity_estimated_acv",
+            "type": "number"
+          },
+          {
+            "description": "Expected ACV: opportunity ACV if available by snapshot, else revenue band midpoint heuristic (NaN if neither available).",
+            "name": "expected_acv",
+            "type": "number"
+          },
+          {
+            "description": "Total touches over full 90-day window. LEAKAGE TRAP: uses post-snapshot data. Included for pedagogical purposes only.",
+            "name": "total_touches_all",
+            "type": "integer"
+          },
+          {
+            "description": "Label: True if a closed_won event occurred within 90 days of the snapshot anchor date. Derived from simulated events.",
+            "name": "converted_within_90_days",
+            "type": "boolean"
+          }
+        ]
+      }
+    },
+    {
+      "description": "Advanced tier feature dictionary (canonical column spec).",
+      "path": "advanced/feature_dictionary.csv"
+    },
+    {
+      "description": "Advanced tier train split for `converted_within_90_days` (3,500 rows).",
+      "path": "advanced/tasks/converted_within_90_days/train.parquet",
+      "schema": {
+        "fields": [
+          {
+            "name": "account_id",
+            "type": "string"
+          },
+          {
+            "name": "industry",
+            "type": "string"
+          },
+          {
+            "name": "region",
+            "type": "string"
+          },
+          {
+            "name": "employee_band",
+            "type": "string"
+          },
+          {
+            "name": "estimated_revenue_band",
+            "type": "string"
+          },
+          {
+            "name": "process_maturity_band",
+            "type": "string"
+          },
+          {
+            "name": "contact_id",
+            "type": "string"
+          },
+          {
+            "name": "role_function",
+            "type": "string"
+          },
+          {
+            "name": "seniority",
+            "type": "string"
+          },
+          {
+            "name": "buyer_role",
+            "type": "string"
+          },
+          {
+            "name": "lead_id",
+            "type": "string"
+          },
+          {
+            "name": "lead_created_at",
+            "type": "string"
+          },
+          {
+            "name": "lead_source",
+            "type": "string"
+          },
+          {
+            "name": "first_touch_channel",
+            "type": "string"
+          },
+          {
+            "name": "touch_count",
+            "type": "number"
+          },
+          {
+            "name": "inbound_touch_count",
+            "type": "number"
+          },
+          {
+            "name": "outbound_touch_count",
+            "type": "number"
+          },
+          {
+            "name": "session_count",
+            "type": "number"
+          },
+          {
+            "name": "pricing_page_views",
+            "type": "number"
+          },
+          {
+            "name": "demo_page_views",
+            "type": "number"
+          },
+          {
+            "name": "total_session_duration_seconds",
+            "type": "number"
+          },
+          {
+            "name": "touches_week_1",
+            "type": "number"
+          },
+          {
+            "name": "touches_last_7_days",
+            "type": "number"
+          },
+          {
+            "name": "days_since_first_touch",
+            "type": "number"
+          },
+          {
+            "name": "activity_count",
+            "type": "number"
+          },
+          {
+            "name": "days_since_last_touch",
+            "type": "number"
+          },
+          {
+            "name": "opportunity_created",
+            "type": "boolean"
+          },
+          {
+            "name": "has_open_opportunity",
+            "type": "boolean"
+          },
+          {
+            "name": "opportunity_estimated_acv",
+            "type": "number"
+          },
+          {
+            "name": "expected_acv",
+            "type": "number"
+          },
+          {
+            "name": "total_touches_all",
+            "type": "number"
+          },
+          {
+            "name": "converted_within_90_days",
+            "type": "boolean"
+          }
+        ]
+      }
+    },
+    {
+      "description": "Advanced tier valid split for `converted_within_90_days` (750 rows).",
+      "path": "advanced/tasks/converted_within_90_days/valid.parquet",
+      "schema": {
+        "fields": [
+          {
+            "name": "account_id",
+            "type": "string"
+          },
+          {
+            "name": "industry",
+            "type": "string"
+          },
+          {
+            "name": "region",
+            "type": "string"
+          },
+          {
+            "name": "employee_band",
+            "type": "string"
+          },
+          {
+            "name": "estimated_revenue_band",
+            "type": "string"
+          },
+          {
+            "name": "process_maturity_band",
+            "type": "string"
+          },
+          {
+            "name": "contact_id",
+            "type": "string"
+          },
+          {
+            "name": "role_function",
+            "type": "string"
+          },
+          {
+            "name": "seniority",
+            "type": "string"
+          },
+          {
+            "name": "buyer_role",
+            "type": "string"
+          },
+          {
+            "name": "lead_id",
+            "type": "string"
+          },
+          {
+            "name": "lead_created_at",
+            "type": "string"
+          },
+          {
+            "name": "lead_source",
+            "type": "string"
+          },
+          {
+            "name": "first_touch_channel",
+            "type": "string"
+          },
+          {
+            "name": "touch_count",
+            "type": "number"
+          },
+          {
+            "name": "inbound_touch_count",
+            "type": "number"
+          },
+          {
+            "name": "outbound_touch_count",
+            "type": "number"
+          },
+          {
+            "name": "session_count",
+            "type": "number"
+          },
+          {
+            "name": "pricing_page_views",
+            "type": "number"
+          },
+          {
+            "name": "demo_page_views",
+            "type": "number"
+          },
+          {
+            "name": "total_session_duration_seconds",
+            "type": "number"
+          },
+          {
+            "name": "touches_week_1",
+            "type": "number"
+          },
+          {
+            "name": "touches_last_7_days",
+            "type": "number"
+          },
+          {
+            "name": "days_since_first_touch",
+            "type": "number"
+          },
+          {
+            "name": "activity_count",
+            "type": "number"
+          },
+          {
+            "name": "days_since_last_touch",
+            "type": "number"
+          },
+          {
+            "name": "opportunity_created",
+            "type": "boolean"
+          },
+          {
+            "name": "has_open_opportunity",
+            "type": "boolean"
+          },
+          {
+            "name": "opportunity_estimated_acv",
+            "type": "number"
+          },
+          {
+            "name": "expected_acv",
+            "type": "number"
+          },
+          {
+            "name": "total_touches_all",
+            "type": "number"
+          },
+          {
+            "name": "converted_within_90_days",
+            "type": "boolean"
+          }
+        ]
+      }
+    },
+    {
+      "description": "Advanced tier test split for `converted_within_90_days` (750 rows).",
+      "path": "advanced/tasks/converted_within_90_days/test.parquet",
+      "schema": {
+        "fields": [
+          {
+            "name": "account_id",
+            "type": "string"
+          },
+          {
+            "name": "industry",
+            "type": "string"
+          },
+          {
+            "name": "region",
+            "type": "string"
+          },
+          {
+            "name": "employee_band",
+            "type": "string"
+          },
+          {
+            "name": "estimated_revenue_band",
+            "type": "string"
+          },
+          {
+            "name": "process_maturity_band",
+            "type": "string"
+          },
+          {
+            "name": "contact_id",
+            "type": "string"
+          },
+          {
+            "name": "role_function",
+            "type": "string"
+          },
+          {
+            "name": "seniority",
+            "type": "string"
+          },
+          {
+            "name": "buyer_role",
+            "type": "string"
+          },
+          {
+            "name": "lead_id",
+            "type": "string"
+          },
+          {
+            "name": "lead_created_at",
+            "type": "string"
+          },
+          {
+            "name": "lead_source",
+            "type": "string"
+          },
+          {
+            "name": "first_touch_channel",
+            "type": "string"
+          },
+          {
+            "name": "touch_count",
+            "type": "number"
+          },
+          {
+            "name": "inbound_touch_count",
+            "type": "number"
+          },
+          {
+            "name": "outbound_touch_count",
+            "type": "number"
+          },
+          {
+            "name": "session_count",
+            "type": "number"
+          },
+          {
+            "name": "pricing_page_views",
+            "type": "number"
+          },
+          {
+            "name": "demo_page_views",
+            "type": "number"
+          },
+          {
+            "name": "total_session_duration_seconds",
+            "type": "number"
+          },
+          {
+            "name": "touches_week_1",
+            "type": "number"
+          },
+          {
+            "name": "touches_last_7_days",
+            "type": "number"
+          },
+          {
+            "name": "days_since_first_touch",
+            "type": "number"
+          },
+          {
+            "name": "activity_count",
+            "type": "number"
+          },
+          {
+            "name": "days_since_last_touch",
+            "type": "number"
+          },
+          {
+            "name": "opportunity_created",
+            "type": "boolean"
+          },
+          {
+            "name": "has_open_opportunity",
+            "type": "boolean"
+          },
+          {
+            "name": "opportunity_estimated_acv",
+            "type": "number"
+          },
+          {
+            "name": "expected_acv",
+            "type": "number"
+          },
+          {
+            "name": "total_touches_all",
+            "type": "number"
+          },
+          {
+            "name": "converted_within_90_days",
+            "type": "boolean"
+          }
+        ]
+      }
+    },
+    {
+      "description": "Advanced tier `accounts` relational table (1,500 rows) — snapshot-safe.",
+      "path": "advanced/tables/accounts.parquet",
+      "schema": {
+        "fields": [
+          {
+            "name": "account_id",
+            "type": "string"
+          },
+          {
+            "name": "company_name",
+            "type": "string"
+          },
+          {
+            "name": "industry",
+            "type": "string"
+          },
+          {
+            "name": "region",
+            "type": "string"
+          },
+          {
+            "name": "employee_band",
+            "type": "string"
+          },
+          {
+            "name": "estimated_revenue_band",
+            "type": "string"
+          },
+          {
+            "name": "process_maturity_band",
+            "type": "string"
+          },
+          {
+            "name": "created_at",
+            "type": "string"
+          }
+        ]
+      }
+    },
+    {
+      "description": "Advanced tier `contacts` relational table (4,200 rows) — snapshot-safe.",
+      "path": "advanced/tables/contacts.parquet",
+      "schema": {
+        "fields": [
+          {
+            "name": "contact_id",
+            "type": "string"
+          },
+          {
+            "name": "account_id",
+            "type": "string"
+          },
+          {
+            "name": "job_title",
+            "type": "string"
+          },
+          {
+            "name": "role_function",
+            "type": "string"
+          },
+          {
+            "name": "seniority",
+            "type": "string"
+          },
+          {
+            "name": "buyer_role",
+            "type": "string"
+          },
+          {
+            "name": "email_domain_type",
+            "type": "string"
+          },
+          {
+            "name": "created_at",
+            "type": "string"
+          }
+        ]
+      }
+    },
+    {
+      "description": "Advanced tier `leads` relational table (5,000 rows) — snapshot-safe.",
+      "path": "advanced/tables/leads.parquet",
+      "schema": {
+        "fields": [
+          {
+            "name": "lead_id",
+            "type": "string"
+          },
+          {
+            "name": "contact_id",
+            "type": "string"
+          },
+          {
+            "name": "account_id",
+            "type": "string"
+          },
+          {
+            "name": "lead_created_at",
+            "type": "string"
+          },
+          {
+            "name": "lead_source",
+            "type": "string"
+          },
+          {
+            "name": "first_touch_channel",
+            "type": "string"
+          },
+          {
+            "name": "owner_rep_id",
+            "type": "string"
+          }
+        ]
+      }
+    },
+    {
+      "description": "Advanced tier `touches` relational table (38,208 rows) — snapshot-safe.",
+      "path": "advanced/tables/touches.parquet",
+      "schema": {
+        "fields": [
+          {
+            "name": "touch_id",
+            "type": "string"
+          },
+          {
+            "name": "lead_id",
+            "type": "string"
+          },
+          {
+            "name": "touch_timestamp",
+            "type": "string"
+          },
+          {
+            "name": "touch_type",
+            "type": "string"
+          },
+          {
+            "name": "touch_channel",
+            "type": "string"
+          },
+          {
+            "name": "touch_direction",
+            "type": "string"
+          },
+          {
+            "name": "campaign_id",
+            "type": "string"
+          }
+        ]
+      }
+    },
+    {
+      "description": "Advanced tier `sessions` relational table (9,942 rows) — snapshot-safe.",
+      "path": "advanced/tables/sessions.parquet",
+      "schema": {
+        "fields": [
+          {
+            "name": "session_id",
+            "type": "string"
+          },
+          {
+            "name": "lead_id",
+            "type": "string"
+          },
+          {
+            "name": "session_timestamp",
+            "type": "string"
+          },
+          {
+            "name": "session_type",
+            "type": "string"
+          },
+          {
+            "name": "page_views",
+            "type": "integer"
+          },
+          {
+            "name": "pricing_page_views",
+            "type": "integer"
+          },
+          {
+            "name": "demo_page_views",
+            "type": "integer"
+          },
+          {
+            "name": "session_duration_seconds",
+            "type": "integer"
+          }
+        ]
+      }
+    },
+    {
+      "description": "Advanced tier `sales_activities` relational table (19,995 rows) — snapshot-safe.",
+      "path": "advanced/tables/sales_activities.parquet",
+      "schema": {
+        "fields": [
+          {
+            "name": "activity_id",
+            "type": "string"
+          },
+          {
+            "name": "lead_id",
+            "type": "string"
+          },
+          {
+            "name": "rep_id",
+            "type": "string"
+          },
+          {
+            "name": "activity_timestamp",
+            "type": "string"
+          },
+          {
+            "name": "activity_type",
+            "type": "string"
+          },
+          {
+            "name": "activity_outcome",
+            "type": "string"
+          }
+        ]
+      }
+    },
+    {
+      "description": "Advanced tier `opportunities` relational table (4,004 rows) — snapshot-safe.",
+      "path": "advanced/tables/opportunities.parquet",
+      "schema": {
+        "fields": [
+          {
+            "name": "opportunity_id",
+            "type": "string"
+          },
+          {
+            "name": "lead_id",
+            "type": "string"
+          },
+          {
+            "name": "created_at",
+            "type": "string"
+          },
+          {
+            "name": "stage",
+            "type": "string"
+          },
+          {
+            "name": "estimated_acv",
+            "type": "integer"
+          }
+        ]
+      }
+    },
+    {
+      "description": "Advanced tier auto-rendered dataset card.",
+      "path": "advanced/dataset_card.md"
+    },
+    {
+      "description": "Advanced tier provenance manifest (recipe, seed, package version, file hashes, snapshot_day, redaction contract).",
+      "path": "advanced/manifest.json"
+    }
+  ],
+  "subtitle": "Three-tier synthetic CRM funnel for leakage-aware lead scoring",
+  "title": "LeadForge: Synthetic B2B Lead Scoring (v1)",
+  "userSpecifiedSources": [
+    {
+      "title": "leadforge source repository",
+      "url": "https://github.com/leadforge-dev/leadforge"
+    },
+    {
+      "title": "v1 release validation report",
+      "url": "https://github.com/leadforge-dev/leadforge/tree/main/release/validation"
+    }
+  ]
+}
diff --git a/scripts/generate_cover_image.py b/scripts/generate_cover_image.py
new file mode 100644
index 0000000..5cff167
--- /dev/null
+++ b/scripts/generate_cover_image.py
@@ -0,0 +1,245 @@
+#!/usr/bin/env python3
+"""Generate the Kaggle cover image for ``leadforge-lead-scoring-v1``.
+
+The cover image is rendered programmatically rather than hand-designed
+or licensed so that:
+
+* re-running this script on the same machine produces byte-identical
+  output, guarded by ``test_render_cover_is_byte_deterministic`` —
+  enough for local regression detection;
+* the source-of-truth for what the image *says* sits in version
+  control, not in a designer's file or a stock-photo licence;
+* there is no licensing question.
+
+**Cross-platform byte equality is NOT guaranteed.** The committed
+``release/dataset-cover-image.png`` was rendered on whichever machine
+last ran this script; Pillow + FreeType produce slightly different
+glyph rasterisation between macOS and Linux (different FreeType
+versions, different font-hinting tables).  The committed PNG is
+therefore one valid render — checked into git so a fresh clone has a
+usable cover image without first running this script — not a
+hash-locked artefact.  Tests assert dimensions and per-machine
+determinism, not committed-vs-fresh byte equality.
+
+Output: ``release/dataset-cover-image.png`` at 1280 × 640 px (2:1
+aspect, well above Kaggle's 560 × 280 minimum, with a 1:1 thumbnail
+crop centred on the headline). Pillow ships with matplotlib (already a
+dev / scripts extra), so this script does not require any new
+dependency.
+
+Headline metrics — conversion rates and LR AUC values — are pinned
+literals sourced from the cross-seed medians (seeds 42–46) reported in
+``release/validation/validation_report.md``. They are not recomputed
+at render time: the cover image is intentionally a documentation-grade
+artefact that lags by one validation cycle, not a live metric panel.
+"""
+
+from __future__ import annotations
+
+import argparse
+import sys
+from collections.abc import Sequence
+from dataclasses import dataclass
+from pathlib import Path
+from typing import Final
+
+import matplotlib.font_manager as fm
+from PIL import Image, ImageDraw, ImageFont
+
+# ---------------------------------------------------------------------------
+# Layout constants (pixels)
+# ---------------------------------------------------------------------------
+
+CANVAS_WIDTH: Final[int] = 1280
+CANVAS_HEIGHT: Final[int] = 640
+LEFT_MARGIN: Final[int] = 80
+
+#: Background — deep navy.
+BACKGROUND: Final[tuple[int, int, int]] = (13, 27, 42)
+#: Card background — slightly lighter navy.
+CARD_BACKGROUND: Final[tuple[int, int, int]] = (27, 38, 59)
+#: Primary text colour — pure white.
+TEXT_PRIMARY: Final[tuple[int, int, int]] = (255, 255, 255)
+#: Secondary text colour — pale steel.
+TEXT_SECONDARY: Final[tuple[int, int, int]] = (200, 220, 240)
+
+DEFAULT_OUT_PATH: Final[Path] = Path("release/dataset-cover-image.png")
+
+
+@dataclass(frozen=True)
+class TierBadge:
+    """Per-tier headline shown on the cover."""
+
+    name: str
+    conversion_rate_pct: str
+    lr_auc: float
+    accent: tuple[int, int, int]
+
+
+#: Cross-seed medians (seeds 42-46) from
+#: ``release/validation/validation_report.md`` — pinned literals so the
+#: cover image is reproducible without reading the report at render
+#: time.
+TIER_BADGES: Final[tuple[TierBadge, ...]] = (
+    TierBadge("Intro", "42.7%", 0.879, (76, 175, 80)),
+    TierBadge("Intermediate", "21.6%", 0.886, (255, 152, 0)),
+    TierBadge("Advanced", "8.4%", 0.886, (244, 67, 54)),
+)
+
+
+# ---------------------------------------------------------------------------
+# Font loading
+# ---------------------------------------------------------------------------
+
+
+def _find_font(family: str, *, weight: str = "normal") -> Path:
+    """Locate a font file via matplotlib's font manager.
+
+    matplotlib bundles DejaVu Sans, so this resolves to a stable file
+    path in any environment where matplotlib is installed (the
+    ``[scripts]`` and ``[dev]`` extras both pull it in).  The same
+    byte content of the font file → identical glyph rasters →
+    byte-identical PNG output.
+    """
+
+    prop = fm.FontProperties(family=family, weight=weight)
+    return Path(fm.findfont(prop, fallback_to_default=False))
+
+
+# ---------------------------------------------------------------------------
+# Drawing
+# ---------------------------------------------------------------------------
+
+
+def _draw_title_block(draw: ImageDraw.ImageDraw, font_paths: dict[str, Path]) -> None:
+    """Render the title, tagline, and subtitle text block."""
+
+    title_font = ImageFont.truetype(str(font_paths["bold"]), 96)
+    draw.text((LEFT_MARGIN, 88), "LeadForge", font=title_font, fill=TEXT_PRIMARY)
+
+    tagline_font = ImageFont.truetype(str(font_paths["regular"]), 40)
+    draw.text(
+        (LEFT_MARGIN, 208),
+        "Synthetic B2B Lead Scoring · v1",
+        font=tagline_font,
+        fill=TEXT_SECONDARY,
+    )
+
+    subtitle_font = ImageFont.truetype(str(font_paths["regular"]), 24)
+    draw.text(
+        (LEFT_MARGIN, 280),
+        "5,000 leads · 3 difficulty tiers · 90-day conversion · MIT",
+        font=subtitle_font,
+        fill=TEXT_SECONDARY,
+    )
+
+
+def _draw_tier_card(
+    draw: ImageDraw.ImageDraw,
+    *,
+    badge: TierBadge,
+    box: tuple[int, int, int, int],
+    font_paths: dict[str, Path],
+) -> None:
+    """Render one tier card inside ``box`` (left, top, right, bottom)."""
+
+    left, top, right, bottom = box
+    draw.rectangle((left, top, right, bottom), fill=CARD_BACKGROUND)
+    # Coloured accent stripe down the left edge.
+    draw.rectangle((left, top, left + 8, bottom), fill=badge.accent)
+
+    name_font = ImageFont.truetype(str(font_paths["bold"]), 36)
+    draw.text((left + 32, top + 24), badge.name, font=name_font, fill=TEXT_PRIMARY)
+
+    body_font = ImageFont.truetype(str(font_paths["regular"]), 22)
+    draw.text(
+        (left + 32, top + 80),
+        f"Conversion: {badge.conversion_rate_pct}",
+        font=body_font,
+        fill=TEXT_SECONDARY,
+    )
+    draw.text(
+        (left + 32, top + 116),
+        f"LR AUC: {badge.lr_auc:.3f}",
+        font=body_font,
+        fill=TEXT_SECONDARY,
+    )
+
+
+def render_cover(badges: Sequence[TierBadge] = TIER_BADGES) -> Image.Image:
+    """Render the cover image as a fresh ``PIL.Image`` instance."""
+
+    image = Image.new("RGB", (CANVAS_WIDTH, CANVAS_HEIGHT), BACKGROUND)
+    draw = ImageDraw.Draw(image)
+
+    font_paths = {
+        "regular": _find_font("DejaVu Sans", weight="normal"),
+        "bold": _find_font("DejaVu Sans", weight="bold"),
+    }
+
+    _draw_title_block(draw, font_paths)
+
+    # Three equal-width cards spanning the bottom half of the canvas.
+    card_top = 400
+    card_bottom = 580
+    card_count = len(badges)
+    gap = 40
+    available = CANVAS_WIDTH - 2 * LEFT_MARGIN
+    card_width = (available - gap * (card_count - 1)) // card_count
+    for i, badge in enumerate(badges):
+        left = LEFT_MARGIN + i * (card_width + gap)
+        right = left + card_width
+        _draw_tier_card(
+            draw,
+            badge=badge,
+            box=(left, card_top, right, card_bottom),
+            font_paths=font_paths,
+        )
+
+    return image
+
+
+def write_cover(path: Path, image: Image.Image | None = None) -> Path:
+    """Render and write the cover image to ``path`` deterministically.
+
+    Pillow's PNG writer is byte-deterministic given the same input
+    image and the same encoder settings — pinning ``optimize=False``
+    and a fixed ``compress_level`` removes the only sources of
+    run-to-run variance.
+    """
+
+    if image is None:
+        image = render_cover()
+    path.parent.mkdir(parents=True, exist_ok=True)
+    image.save(path, format="PNG", optimize=False, compress_level=6)
+    return path
+
+
+# ---------------------------------------------------------------------------
+# CLI
+# ---------------------------------------------------------------------------
+
+
+def _parse_args(argv: Sequence[str] | None) -> argparse.Namespace:
+    parser = argparse.ArgumentParser(
+        description="Generate the deterministic Kaggle cover image for leadforge-lead-scoring-v1.",
+    )
+    parser.add_argument(
+        "--out",
+        type=Path,
+        default=DEFAULT_OUT_PATH,
+        help="output PNG path (default: %(default)s)",
+    )
+    return parser.parse_args(argv)
+
+
+def main(argv: Sequence[str] | None = None) -> int:
+    args = _parse_args(argv)
+    out_path: Path = args.out
+    write_cover(out_path)
+    print(f"wrote {out_path}", file=sys.stderr)
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
diff --git a/scripts/package_kaggle_release.py b/scripts/package_kaggle_release.py
new file mode 100644
index 0000000..96065a5
--- /dev/null
+++ b/scripts/package_kaggle_release.py
@@ -0,0 +1,1111 @@
+#!/usr/bin/env python3
+"""Package the ``leadforge-lead-scoring-v1`` family for Kaggle.
+
+PR 5.1 — first of two PRs in Phase 5 (Platform packaging) of the v1
+release roadmap.  This script:
+
+1. Reads each public tier's ``manifest.json`` + ``feature_dictionary.csv``
+   + flat CSV header under ``release/`` and assembles a Kaggle
+   ``dataset-metadata.json`` that satisfies G11.1 of
+   ``docs/release/v1_acceptance_gates.md`` — title length, subtitle
+   length, slug length, single licence, ``expectedUpdateFrequency``
+   from the approved set, image filename, and
+   ``resources[].schema.fields`` listed **in column order** for every
+   tabular resource (CSV via the feature dictionary; parquet via the
+   Arrow schema).
+2. Validates the cover image at ``release/dataset-cover-image.png``
+   (≥ 560 × 280 per G11.2; generated by
+   ``scripts/generate_cover_image.py``).
+3. Writes ``release/kaggle/dataset-metadata.json`` deterministically:
+   the same release input produces a byte-identical metadata file
+   (audit-artifact-sync pattern; guarded by
+   ``tests/scripts/test_package_kaggle_release.py``).
+4. Optionally assembles a Kaggle-CLI-shaped upload directory under
+   ``release/kaggle/`` as real-file copies of the per-tier bundles
+   plus a rewritten copy of ``release/README.md`` whose directory
+   diagram and ``../`` links resolve correctly when read on the
+   Kaggle dataset page.  An earlier draft used symlinks; we copy
+   instead because Kaggle's CLI walks the upload directory with
+   ``followlinks=False`` in some versions.
+
+The actual ``kaggle datasets create`` upload lives in PR 7.2; this
+script is intentionally publish-free.  ``--dry-run`` validates and
+writes the metadata without touching the upload-dir layout, useful
+for shape iteration; the default mode also assembles the upload tree.
+
+Failed validation exits with rc=1; pre-flight errors (missing release
+dir, missing tier, missing cover image, unsafe ``--kaggle-dir``)
+exit with rc=2.
+"""
+
+from __future__ import annotations
+
+import argparse
+import json
+import re
+import shutil
+import sys
+from collections.abc import Sequence
+from dataclasses import dataclass, field
+from pathlib import Path
+from typing import Any, Final
+
+import pandas as pd
+import pyarrow as pa
+import pyarrow.parquet as pq
+from PIL import Image
+
+# ---------------------------------------------------------------------------
+# Kaggle field constraints (chatgpt v2 §19, verified from official docs)
+# ---------------------------------------------------------------------------
+
+TITLE_LEN_RANGE: Final[tuple[int, int]] = (6, 50)
+SUBTITLE_LEN_RANGE: Final[tuple[int, int]] = (20, 80)
+SLUG_LEN_RANGE: Final[tuple[int, int]] = (3, 50)
+
+#: Allowed values for ``expectedUpdateFrequency`` (Kaggle CLI rejects
+#: anything else).
+APPROVED_UPDATE_FREQUENCIES: Final[tuple[str, ...]] = (
+    "never",
+    "annually",
+    "quarterly",
+    "monthly",
+    "weekly",
+    "daily",
+    "hourly",
+)
+
+#: Cover-image minimum dimensions per G11.2: 560 × 280 minimum, with
+#: 2:1 header / 1:1 thumbnail crops in mind.
+COVER_IMAGE_MIN_WIDTH: Final[int] = 560
+COVER_IMAGE_MIN_HEIGHT: Final[int] = 280
+
+#: Allowed cover-image extensions per Kaggle docs.
+ALLOWED_COVER_IMAGE_SUFFIXES: Final[tuple[str, ...]] = (
+    ".png",
+    ".jpg",
+    ".jpeg",
+    ".webp",
+)
+
+#: Slug pattern — Kaggle dataset slugs are lowercase alphanumeric with
+#: hyphens.  Boundary chars must be alphanumeric so the slug never
+#: starts or ends with a hyphen.
+SLUG_PATTERN: Final[re.Pattern[str]] = re.compile(r"^[a-z0-9][a-z0-9-]*[a-z0-9]$")
+
+# ---------------------------------------------------------------------------
+# Release-specific defaults (G1.2 dataset slug + G11 metadata content)
+# ---------------------------------------------------------------------------
+
+DEFAULT_USER_SLUG: Final[str] = "leadforge"
+DEFAULT_DATASET_SLUG: Final[str] = "leadforge-lead-scoring-v1"
+
+DEFAULT_TITLE: Final[str] = "LeadForge: Synthetic B2B Lead Scoring (v1)"
+DEFAULT_SUBTITLE: Final[str] = "Three-tier synthetic CRM funnel for leakage-aware lead scoring"
+
+DEFAULT_KEYWORDS: Final[tuple[str, ...]] = (
+    "b2b",
+    "classification",
+    "crm",
+    "education",
+    "lead-scoring",
+    "saas",
+    "synthetic-data",
+    "tabular",
+)
+
+
+@dataclass(frozen=True)
+class UserSource:
+    """One entry under ``userSpecifiedSources``.
+
+    Defined alongside the constants so ``DEFAULT_USER_SOURCES`` below
+    can reference it without a forward declaration; the rest of the
+    metadata dataclasses live further down in their own section.
+    """
+
+    title: str
+    url: str
+
+
+DEFAULT_USER_SOURCES: Final[tuple[UserSource, ...]] = (
+    UserSource(
+        title="leadforge source repository",
+        url="https://github.com/leadforge-dev/leadforge",
+    ),
+    UserSource(
+        title="v1 release validation report",
+        url="https://github.com/leadforge-dev/leadforge/tree/main/release/validation",
+    ),
+)
+
+DEFAULT_LICENSE_NAME: Final[str] = "MIT"
+DEFAULT_UPDATE_FREQUENCY: Final[str] = "never"
+
+DEFAULT_TIERS: Final[tuple[str, ...]] = ("intro", "intermediate", "advanced")
+DEFAULT_TASK: Final[str] = "converted_within_90_days"
+
+DEFAULT_RELEASE_DIR: Final[Path] = Path("release")
+DEFAULT_KAGGLE_DIR: Final[Path] = Path("release/kaggle")
+DEFAULT_COVER_IMAGE: Final[Path] = Path("release/dataset-cover-image.png")
+
+#: Top-level files at ``release/`` that ship to Kaggle alongside the
+#: bundles.  README.md is rewritten on the way in (see
+#: :func:`_kaggle_readme_text`); LICENSE is taken verbatim.
+TOP_LEVEL_DOCS: Final[tuple[str, ...]] = ("README.md", "LICENSE")
+
+#: Tables that may appear in a public bundle, in canonical render
+#: order.  ``customers`` and ``subscriptions`` are intentionally
+#: absent — their presence in a public bundle would itself be leakage
+#: (PR 2.2).
+BUNDLE_TABLES: Final[tuple[str, ...]] = (
+    "accounts",
+    "contacts",
+    "leads",
+    "touches",
+    "sessions",
+    "sales_activities",
+    "opportunities",
+)
+
+#: Mapping from feature_dictionary.csv ``dtype`` (see
+#: ``leadforge/schema/dictionaries.py``) to a Frictionless Data
+#: Package ``schema.fields[].type`` token, which Kaggle's resource
+#: schema uses.
+DTYPE_TO_FRICTIONLESS: Final[dict[str, str]] = {
+    "string": "string",
+    "Int64": "integer",
+    "Float64": "number",
+    "boolean": "boolean",
+}
+
+#: Description for the ``split`` column, which is present in the flat
+#: CSV but not in the feature dictionary (it tracks task-split
+#: membership rather than describing a feature).
+SPLIT_COLUMN_DESCRIPTION: Final[str] = (
+    "Task-split membership: one of `train`, `valid`, `test`. "
+    "Matches the per-row split assignment in `tasks/converted_within_90_days/`."
+)
+
+# ---------------------------------------------------------------------------
+# Description / README rewriting
+# ---------------------------------------------------------------------------
+
+GITHUB_BLOB_BASE: Final[str] = "https://github.com/leadforge-dev/leadforge/blob/main"
+
+#: The "What's inside" tree diagram in ``release/README.md``.  The
+#: published README on Kaggle should describe the *upload* layout
+#: (which has dataset-metadata.json + cover image at the top, no
+#: instructor companion, no notebooks/validation siblings), not the
+#: source-repo layout — we substitute the block on the way out.
+KAGGLE_TREE_BLOCK: Final[str] = """```
+release/
+├── intro/ intermediate/ advanced/    # student_public bundles, one per difficulty tier
+│   ├── manifest.json                 # provenance + file hashes
+│   ├── dataset_card.md               # auto-rendered per-bundle card
+│   ├── feature_dictionary.csv        # authoritative column spec
+│   ├── lead_scoring.csv              # flat convenience CSV (all splits)
+│   ├── tables/*.parquet              # 7 snapshot-safe relational tables
+│   └── tasks/converted_within_90_days/{train,valid,test}.parquet
+├── intermediate_instructor/          # research companion: full-horizon tables + metadata/
+├── notebooks/01_baseline_lead_scoring.ipynb
+└── validation/                       # validation_report.{json,md} + figures
+```"""
+
+KAGGLE_UPLOAD_TREE_BLOCK: Final[str] = """```
+.
+├── intro/ intermediate/ advanced/    # student_public bundles, one per difficulty tier
+│   ├── manifest.json                 # provenance + file hashes
+│   ├── dataset_card.md               # auto-rendered per-bundle card
+│   ├── feature_dictionary.csv        # authoritative column spec
+│   ├── lead_scoring.csv              # flat convenience CSV (all splits)
+│   ├── tables/*.parquet              # 7 snapshot-safe relational tables
+│   └── tasks/converted_within_90_days/{train,valid,test}.parquet
+├── dataset-metadata.json             # Kaggle dataset metadata
+├── dataset-cover-image.png           # Kaggle cover image
+├── README.md                         # Kaggle package README
+└── LICENSE
+```"""
+
+#: Inline relative link ``](../foo)`` → ``](GITHUB_BLOB_BASE/foo)``
+#: for any markdown link that escapes the bundle root.
+_PARENT_RELATIVE_LINK: Final[re.Pattern[str]] = re.compile(r"\]\(\.\./([^)]+)\)")
+
+#: The README points at ``validation/validation_report.md`` (a path
+#: that lives under ``release/`` but not under the Kaggle upload
+#: directory).  Rewrite to a GitHub blob URL so the link works on
+#: Kaggle.
+_VALIDATION_REPORT_LINK: Final[str] = "](validation/validation_report.md)"
+_VALIDATION_REPORT_URL: Final[str] = (
+    f"]({GITHUB_BLOB_BASE}/release/validation/validation_report.md)"
+)
+
+
+def _kaggle_readme_text(readme: str) -> str:
+    """Apply the Kaggle-specific rewrites to a copy of the release README.
+
+    Rewrites:
+
+    1. Source-repo tree diagram → upload-tree diagram (the published
+       README should describe what the *user* sees on Kaggle, not the
+       source repo layout).
+    2. ``](../foo)`` → ``]({GITHUB_BLOB_BASE}/foo)`` (markdown links
+       that escape the bundle root resolve to the source repo on
+       GitHub).
+    3. ``](validation/validation_report.md)`` → blob URL (the
+       validation report does not ship to Kaggle; readers click
+       through to GitHub).
+    """
+
+    text = readme.replace(KAGGLE_TREE_BLOCK, KAGGLE_UPLOAD_TREE_BLOCK)
+    text = _PARENT_RELATIVE_LINK.sub(rf"]({GITHUB_BLOB_BASE}/\1)", text)
+    text = text.replace(_VALIDATION_REPORT_LINK, _VALIDATION_REPORT_URL)
+    return text
+
+
+# ---------------------------------------------------------------------------
+# Dataclasses — one per top-level metadata block
+# ---------------------------------------------------------------------------
+
+
+@dataclass(frozen=True)
+class FieldDescriptor:
+    """One column entry inside ``resources[].schema.fields``."""
+
+    name: str
+    type: str
+    description: str | None = None
+
+
+@dataclass(frozen=True)
+class ResourceSchema:
+    """Frictionless-style schema declaration for a tabular resource."""
+
+    fields: tuple[FieldDescriptor, ...]
+
+
+@dataclass(frozen=True)
+class Resource:
+    """One entry under ``resources``.
+
+    ``schema`` is set to ``None`` for non-tabular resources (markdown,
+    JSON manifests).  The renderer drops ``None`` values and
+    ``description=None`` field-level entries so the JSON stays clean.
+    """
+
+    path: str
+    description: str
+    schema: ResourceSchema | None = None
+
+
+@dataclass(frozen=True)
+class LicenseSpec:
+    """One entry under ``licenses``.  Kaggle requires exactly one."""
+
+    name: str
+
+
+# ``UserSource`` is defined above (next to ``DEFAULT_USER_SOURCES``) so
+# the constant can reference it without forward-declaration tricks.
+
+
+@dataclass(frozen=True)
+class DatasetMetadata:
+    """Top-level Kaggle metadata payload.
+
+    These dataclasses are typed records, not invariants — construction
+    is unchecked.  Callers MUST run :func:`validate_metadata` before
+    relying on the metadata being well-formed; that function is the
+    authoritative gate for every Kaggle field constraint.  Doing the
+    validation eagerly in ``__post_init__`` would prevent tests from
+    constructing deliberately bad payloads to exercise the validator,
+    which is why the discipline lives in the validator instead.
+    """
+
+    title: str
+    id: str
+    subtitle: str
+    description: str
+    isPrivate: bool  # noqa: N815 — Kaggle field name is camelCase
+    licenses: tuple[LicenseSpec, ...]
+    keywords: tuple[str, ...]
+    collaborators: tuple[str, ...]
+    expectedUpdateFrequency: str  # noqa: N815 — Kaggle field name
+    userSpecifiedSources: tuple[UserSource, ...]  # noqa: N815 — Kaggle field name
+    image: str
+    resources: tuple[Resource, ...] = field(default_factory=tuple)
+
+
+# ---------------------------------------------------------------------------
+# Validation
+# ---------------------------------------------------------------------------
+
+
+@dataclass(frozen=True)
+class ValidationError:
+    """One field-level validation failure."""
+
+    field: str
+    message: str
+
+
+def _validate_length(name: str, value: str, lo: int, hi: int) -> ValidationError | None:
+    n = len(value)
+    if n < lo or n > hi:
+        return ValidationError(
+            field=name,
+            message=f"length {n} outside Kaggle range [{lo}, {hi}]",
+        )
+    return None
+
+
+def _validate_slug(slug: str, *, field_name: str) -> ValidationError | None:
+    err = _validate_length(field_name, slug, *SLUG_LEN_RANGE)
+    if err is not None:
+        return err
+    if not SLUG_PATTERN.fullmatch(slug):
+        return ValidationError(
+            field=field_name,
+            message=f"slug {slug!r} must be lowercase alphanumeric with hyphens",
+        )
+    return None
+
+
+def _validate_id(value: str) -> list[ValidationError]:
+    """Validate the ``user/slug`` id field.
+
+    Kaggle's actual ``dataset-metadata.json`` schema uses
+    ``<owner>/<slug>``; the slug-only short form some tooling accepts
+    is rejected here so the artefact is upload-ready without
+    publish-time fixup.
+    """
+
+    errors: list[ValidationError] = []
+    if "/" not in value:
+        errors.append(ValidationError(field="id", message=f"id {value!r} missing 'user/' prefix"))
+        return errors
+    user, slug = value.split("/", 1)
+    if not user:
+        errors.append(ValidationError(field="id", message="user prefix is empty"))
+    if not slug:
+        errors.append(ValidationError(field="id", message="slug is empty"))
+        return errors
+    slug_err = _validate_slug(slug, field_name="id (slug)")
+    if slug_err is not None:
+        errors.append(slug_err)
+    return errors
+
+
+def validate_metadata(metadata: DatasetMetadata) -> list[ValidationError]:
+    """Run every Kaggle-side check against a built ``DatasetMetadata``."""
+
+    errors: list[ValidationError] = []
+
+    title_err = _validate_length("title", metadata.title, *TITLE_LEN_RANGE)
+    if title_err is not None:
+        errors.append(title_err)
+
+    subtitle_err = _validate_length("subtitle", metadata.subtitle, *SUBTITLE_LEN_RANGE)
+    if subtitle_err is not None:
+        errors.append(subtitle_err)
+
+    errors.extend(_validate_id(metadata.id))
+
+    if len(metadata.licenses) != 1:
+        errors.append(
+            ValidationError(
+                field="licenses",
+                message=f"expected exactly one entry, got {len(metadata.licenses)}",
+            )
+        )
+
+    if metadata.expectedUpdateFrequency not in APPROVED_UPDATE_FREQUENCIES:
+        errors.append(
+            ValidationError(
+                field="expectedUpdateFrequency",
+                message=(
+                    f"{metadata.expectedUpdateFrequency!r} not in approved values "
+                    f"{APPROVED_UPDATE_FREQUENCIES}"
+                ),
+            )
+        )
+
+    image_suffix = Path(metadata.image).suffix.lower()
+    if image_suffix not in ALLOWED_COVER_IMAGE_SUFFIXES:
+        errors.append(
+            ValidationError(
+                field="image",
+                message=(
+                    f"image extension {image_suffix!r} not in allowed Kaggle suffixes "
+                    f"{ALLOWED_COVER_IMAGE_SUFFIXES}"
+                ),
+            )
+        )
+
+    if not metadata.resources:
+        errors.append(
+            ValidationError(field="resources", message="must contain at least one resource")
+        )
+
+    for i, res in enumerate(metadata.resources):
+        if res.schema is None:
+            continue  # non-tabular resource; schema is optional
+        if not res.schema.fields:
+            errors.append(
+                ValidationError(
+                    field=f"resources[{i}].schema.fields",
+                    message="must contain at least one field when schema is declared",
+                )
+            )
+            continue
+        for j, fd in enumerate(res.schema.fields):
+            if not fd.name or not fd.type:
+                errors.append(
+                    ValidationError(
+                        field=f"resources[{i}].schema.fields[{j}]",
+                        message="each field must declare both name and type",
+                    )
+                )
+
+    return errors
+
+
+def validate_cover_image(path: Path) -> list[ValidationError]:
+    """Validate that ``path`` exists and meets Kaggle's dimension floor."""
+
+    errors: list[ValidationError] = []
+    if not path.exists():
+        errors.append(
+            ValidationError(
+                field="cover_image",
+                message=f"cover image not found at {path}",
+            )
+        )
+        return errors
+    with Image.open(path) as img:
+        width, height = img.size
+    if width < COVER_IMAGE_MIN_WIDTH or height < COVER_IMAGE_MIN_HEIGHT:
+        errors.append(
+            ValidationError(
+                field="cover_image",
+                message=(
+                    f"cover image {width}x{height} below Kaggle minimum "
+                    f"{COVER_IMAGE_MIN_WIDTH}x{COVER_IMAGE_MIN_HEIGHT}"
+                ),
+            )
+        )
+    return errors
+
+
+# ---------------------------------------------------------------------------
+# Bundle reading + resource building
+# ---------------------------------------------------------------------------
+
+
+def _load_feature_dictionary(path: Path) -> dict[str, FieldDescriptor]:
+    """Load ``feature_dictionary.csv`` keyed by column name."""
+
+    df = pd.read_csv(path)
+    descriptors: dict[str, FieldDescriptor] = {}
+    for _, row in df.iterrows():
+        dtype = str(row["dtype"])
+        frictionless_type = DTYPE_TO_FRICTIONLESS.get(dtype)
+        if frictionless_type is None:
+            raise ValueError(
+                f"feature_dictionary.csv at {path}: dtype {dtype!r} not mapped to a "
+                f"Frictionless Data Package type ({sorted(DTYPE_TO_FRICTIONLESS)!r})"
+            )
+        name = str(row["name"])
+        descriptors[name] = FieldDescriptor(
+            name=name,
+            type=frictionless_type,
+            description=str(row["description"]).strip(),
+        )
+    return descriptors
+
+
+def _flat_csv_fields(
+    flat_csv_path: Path, feature_dict: dict[str, FieldDescriptor]
+) -> tuple[FieldDescriptor, ...]:
+    """Build ``schema.fields`` for a flat CSV in CSV column order."""
+
+    columns = list(pd.read_csv(flat_csv_path, nrows=0).columns)
+    fields: list[FieldDescriptor] = []
+    for col in columns:
+        name = str(col)
+        if name == "split":
+            fields.append(
+                FieldDescriptor(name=name, type="string", description=SPLIT_COLUMN_DESCRIPTION)
+            )
+            continue
+        descriptor = feature_dict.get(name)
+        if descriptor is None:
+            raise ValueError(
+                f"flat CSV at {flat_csv_path}: column {name!r} is not present in "
+                f"feature_dictionary.csv — feature dictionary is the source of truth"
+            )
+        fields.append(descriptor)
+    return tuple(fields)
+
+
+def _kaggle_type_from_arrow(dtype: pa.DataType) -> str:
+    """Map a pyarrow type to the Frictionless field-type token."""
+
+    if pa.types.is_boolean(dtype):
+        return "boolean"
+    if pa.types.is_integer(dtype):
+        return "integer"
+    if pa.types.is_floating(dtype) or pa.types.is_decimal(dtype):
+        return "number"
+    if pa.types.is_date(dtype) or pa.types.is_timestamp(dtype) or pa.types.is_time(dtype):
+        return "datetime"
+    return "string"
+
+
+def fields_from_parquet(path: Path) -> tuple[FieldDescriptor, ...]:
+    """Read parquet schema from ``path`` and return ``FieldDescriptor`` rows.
+
+    Kaggle accepts Frictionless schemas on parquet resources too; the
+    parquet file's own Arrow metadata is the ground truth for column
+    order and types, so we read directly rather than mirroring a CSV
+    header.  ``description`` is omitted for parquet fields — relational
+    tables don't have per-column docs in the bundle.
+    """
+
+    schema = pq.read_schema(path)
+    return tuple(FieldDescriptor(name=f.name, type=_kaggle_type_from_arrow(f.type)) for f in schema)
+
+
+def _load_manifest(path: Path) -> dict[str, Any]:
+    payload = json.loads(path.read_text(encoding="utf-8"))
+    if not isinstance(payload, dict):
+        raise ValueError(f"manifest.json at {path} is not a JSON object")
+    return payload
+
+
+def build_tier_resources(
+    release_dir: Path,
+    tier: str,
+    *,
+    task: str = DEFAULT_TASK,
+) -> tuple[Resource, ...]:
+    """Build the ``Resource`` list for one tier in canonical order.
+
+    Order: flat CSV (with full ``schema.fields``) → feature dictionary
+    → task splits (parquet, schema from Arrow) → relational tables
+    (parquet, schema from Arrow) → dataset card → manifest.  Kaggle
+    renders this list in declared order on the dataset page.
+    """
+
+    tier_dir = release_dir / tier
+    if not tier_dir.is_dir():
+        raise FileNotFoundError(f"tier directory missing: {tier_dir}")
+
+    feature_dict_path = tier_dir / "feature_dictionary.csv"
+    feature_dict = _load_feature_dictionary(feature_dict_path)
+    flat_csv_path = tier_dir / "lead_scoring.csv"
+    fields = _flat_csv_fields(flat_csv_path, feature_dict)
+
+    manifest = _load_manifest(tier_dir / "manifest.json")
+    table_inventory = manifest.get("tables", {})
+    snapshot_day = manifest.get("snapshot_day")
+
+    resources: list[Resource] = []
+
+    resources.append(
+        Resource(
+            path=f"{tier}/lead_scoring.csv",
+            description=(
+                f"{tier.capitalize()} tier flat CSV (all splits concatenated, label retained, "
+                f"snapshot_day={snapshot_day}). The `split` column distinguishes "
+                f"train/valid/test rows."
+            ),
+            schema=ResourceSchema(fields=fields),
+        )
+    )
+
+    resources.append(
+        Resource(
+            path=f"{tier}/feature_dictionary.csv",
+            description=f"{tier.capitalize()} tier feature dictionary (canonical column spec).",
+        )
+    )
+
+    for split in ("train", "valid", "test"):
+        split_path = tier_dir / "tasks" / task / f"{split}.parquet"
+        rows = manifest.get("tasks", {}).get(task, {}).get(f"{split}_rows")
+        rows_str = f"{rows:,} rows" if isinstance(rows, int) else "row count in manifest"
+        resources.append(
+            Resource(
+                path=f"{tier}/tasks/{task}/{split}.parquet",
+                description=(f"{tier.capitalize()} tier {split} split for `{task}` ({rows_str})."),
+                schema=ResourceSchema(fields=fields_from_parquet(split_path)),
+            )
+        )
+
+    for table in BUNDLE_TABLES:
+        if table not in table_inventory:
+            continue
+        table_path = tier_dir / "tables" / f"{table}.parquet"
+        row_count = table_inventory[table].get("row_count")
+        rows_str = f"{row_count:,} rows" if isinstance(row_count, int) else ""
+        suffix = f" ({rows_str})" if rows_str else ""
+        resources.append(
+            Resource(
+                path=f"{tier}/tables/{table}.parquet",
+                description=(
+                    f"{tier.capitalize()} tier `{table}` relational table{suffix} — snapshot-safe."
+                ),
+                schema=ResourceSchema(fields=fields_from_parquet(table_path)),
+            )
+        )
+
+    resources.append(
+        Resource(
+            path=f"{tier}/dataset_card.md",
+            description=f"{tier.capitalize()} tier auto-rendered dataset card.",
+        )
+    )
+    resources.append(
+        Resource(
+            path=f"{tier}/manifest.json",
+            description=(
+                f"{tier.capitalize()} tier provenance manifest (recipe, seed, package "
+                f"version, file hashes, snapshot_day, redaction contract)."
+            ),
+        )
+    )
+    return tuple(resources)
+
+
+def build_metadata(
+    release_dir: Path,
+    *,
+    tiers: Sequence[str] = DEFAULT_TIERS,
+    task: str = DEFAULT_TASK,
+    owner: str = DEFAULT_USER_SLUG,
+    dataset_slug: str = DEFAULT_DATASET_SLUG,
+    title: str = DEFAULT_TITLE,
+    subtitle: str = DEFAULT_SUBTITLE,
+    description: str | None = None,
+    keywords: Sequence[str] = DEFAULT_KEYWORDS,
+    license_name: str = DEFAULT_LICENSE_NAME,
+    update_frequency: str = DEFAULT_UPDATE_FREQUENCY,
+    user_sources: Sequence[UserSource] = DEFAULT_USER_SOURCES,
+    cover_image: Path = DEFAULT_COVER_IMAGE,
+) -> DatasetMetadata:
+    """Assemble a ``DatasetMetadata`` from the release tree.
+
+    When ``description`` is ``None`` (the default) we lift the
+    contents of ``release/README.md`` and apply the Kaggle-specific
+    rewrites — Kaggle renders the description above the file list, so
+    a full dataset card there is more useful than a curated blurb.
+    """
+
+    if description is None:
+        readme_path = release_dir / "README.md"
+        description = _kaggle_readme_text(readme_path.read_text(encoding="utf-8"))
+
+    resources: list[Resource] = []
+    for tier in tiers:
+        resources.extend(build_tier_resources(release_dir, tier, task=task))
+
+    return DatasetMetadata(
+        title=title,
+        id=f"{owner}/{dataset_slug}",
+        subtitle=subtitle,
+        description=description,
+        isPrivate=True,
+        licenses=(LicenseSpec(name=license_name),),
+        keywords=tuple(keywords),
+        collaborators=(),
+        expectedUpdateFrequency=update_frequency,
+        userSpecifiedSources=tuple(user_sources),
+        image=cover_image.name,
+        resources=tuple(resources),
+    )
+
+
+# ---------------------------------------------------------------------------
+# Rendering
+# ---------------------------------------------------------------------------
+
+
+def _field_to_dict(fd: FieldDescriptor) -> dict[str, Any]:
+    payload: dict[str, Any] = {"name": fd.name, "type": fd.type}
+    if fd.description is not None:
+        payload["description"] = fd.description
+    return payload
+
+
+def _resource_to_dict(resource: Resource) -> dict[str, Any]:
+    """Serialise a ``Resource`` to a JSON-primitive dict.
+
+    Drops ``schema`` when ``None``; drops ``description`` from
+    individual fields when ``None`` (parquet schemas don't carry
+    per-column documentation).
+    """
+
+    payload: dict[str, Any] = {
+        "path": resource.path,
+        "description": resource.description,
+    }
+    if resource.schema is not None:
+        payload["schema"] = {"fields": [_field_to_dict(fd) for fd in resource.schema.fields]}
+    return payload
+
+
+def metadata_to_dict(metadata: DatasetMetadata) -> dict[str, Any]:
+    """Convert ``DatasetMetadata`` to a JSON-primitive dict.
+
+    Built field-by-field rather than via ``asdict()`` so resource
+    serialisation goes through one path (``_resource_to_dict``) and
+    the keywords array is sorted at render time — making the
+    determinism contract explicit rather than relying on the
+    ``DEFAULT_KEYWORDS`` constant happening to be alphabetised.
+    """
+
+    return {
+        "title": metadata.title,
+        "id": metadata.id,
+        "subtitle": metadata.subtitle,
+        "description": metadata.description,
+        "isPrivate": metadata.isPrivate,
+        "licenses": [{"name": lic.name} for lic in metadata.licenses],
+        "keywords": sorted(metadata.keywords),
+        "collaborators": list(metadata.collaborators),
+        "expectedUpdateFrequency": metadata.expectedUpdateFrequency,
+        "userSpecifiedSources": [
+            {"title": s.title, "url": s.url} for s in metadata.userSpecifiedSources
+        ],
+        "image": metadata.image,
+        "resources": [_resource_to_dict(r) for r in metadata.resources],
+    }
+
+
+def render_metadata_json(metadata: DatasetMetadata) -> str:
+    """Render the metadata as a deterministic JSON string.
+
+    ``ensure_ascii=False`` keeps non-ASCII content (em-dashes, the ×
+    multiplication sign, smart quotes from the inlined README)
+    rendered literally rather than escaped to ``\\u2013`` etc., which
+    is essential for ``git diff`` readability when the README evolves.
+    """
+
+    return (
+        json.dumps(
+            metadata_to_dict(metadata),
+            indent=2,
+            sort_keys=True,
+            ensure_ascii=False,
+        )
+        + "\n"
+    )
+
+
+# ---------------------------------------------------------------------------
+# Upload-directory assembly
+# ---------------------------------------------------------------------------
+
+
+def _validate_kaggle_dir_safe(kaggle_dir: Path, release_dir: Path) -> None:
+    """Refuse to assemble into a path that aliases something dangerous.
+
+    The packager replaces children of ``kaggle_dir`` (rmtree + recopy)
+    so pointing it at ``cwd`` / ``release_dir`` / their parents / the
+    filesystem anchor would clobber unrelated content.  This guard
+    fires before any disk write.
+    """
+
+    resolved = kaggle_dir.resolve()
+    blocked = {
+        Path(resolved.anchor),
+        Path.cwd().resolve(),
+        release_dir.resolve(),
+        release_dir.resolve().parent,
+    }
+    if resolved in blocked:
+        raise ValueError(f"refusing to assemble into unsafe --kaggle-dir: {kaggle_dir}")
+
+
+def _replace_file(src: Path, dst: Path) -> None:
+    """Copy ``src`` → ``dst``, replacing any existing entry at ``dst``."""
+
+    if dst.is_symlink() or dst.is_file():
+        dst.unlink()
+    elif dst.exists() and dst.is_dir():
+        shutil.rmtree(dst)
+    dst.parent.mkdir(parents=True, exist_ok=True)
+    shutil.copy2(src, dst)
+
+
+def _replace_dir(src: Path, dst: Path) -> None:
+    """Copy directory ``src`` → ``dst``, replacing any existing entry."""
+
+    if dst.is_symlink() or dst.is_file():
+        dst.unlink()
+    elif dst.exists() and dst.is_dir():
+        shutil.rmtree(dst)
+    dst.parent.mkdir(parents=True, exist_ok=True)
+    shutil.copytree(src, dst)
+
+
+def assemble_upload_dir(
+    release_dir: Path,
+    kaggle_dir: Path,
+    *,
+    tiers: Sequence[str] = DEFAULT_TIERS,
+    cover_image: Path = DEFAULT_COVER_IMAGE,
+) -> None:
+    """Assemble ``kaggle_dir`` for ``kaggle datasets create`` to consume.
+
+    The output tree is a self-contained directory of real files:
+    cover image, LICENSE, the rewritten README, and full copies of
+    each tier bundle.  Symlinks were considered (and tried in an
+    earlier draft) but Kaggle's CLI walks the upload directory with
+    ``followlinks=False`` in some versions, silently skipping symlinked
+    children — switching to copies removes that fragility at the cost
+    of ~15 MB of disk per assembly run, which is gitignored anyway.
+
+    Re-running the assembly is idempotent: ``_replace_file`` and
+    ``_replace_dir`` rmtree-then-copy any existing entry.  The README
+    is the one file rewritten on the way in (tree diagram + ``../``
+    links).  ``--dry-run`` skips this whole function.
+    """
+
+    _validate_kaggle_dir_safe(kaggle_dir, release_dir)
+    kaggle_dir.mkdir(parents=True, exist_ok=True)
+
+    # Cover image.
+    cover_src = release_dir / cover_image.name
+    if not cover_src.exists():
+        cover_src = cover_image
+    _replace_file(cover_src, kaggle_dir / cover_image.name)
+
+    # LICENSE — straight copy, no rewriting.
+    license_src = release_dir / "LICENSE"
+    if license_src.exists():
+        _replace_file(license_src, kaggle_dir / "LICENSE")
+
+    # README.md — real copy with link rewriting so ``../`` links and
+    # the directory diagram resolve correctly on the Kaggle dataset
+    # page.
+    kaggle_readme = kaggle_dir / "README.md"
+    if kaggle_readme.is_symlink() or kaggle_readme.is_file():
+        kaggle_readme.unlink()
+    readme_src = release_dir / "README.md"
+    if readme_src.exists():
+        kaggle_readme.parent.mkdir(parents=True, exist_ok=True)
+        kaggle_readme.write_text(
+            _kaggle_readme_text(readme_src.read_text(encoding="utf-8")),
+            encoding="utf-8",
+        )
+
+    # Per-tier bundles — full directory copies.
+    for tier in tiers:
+        tier_src = release_dir / tier
+        _replace_dir(tier_src, kaggle_dir / tier)
+
+
+# ---------------------------------------------------------------------------
+# Driver
+# ---------------------------------------------------------------------------
+
+
+@dataclass(frozen=True)
+class PackagerOutcome:
+    """Return value from :func:`run_packager` — used by tests + CLI."""
+
+    metadata: DatasetMetadata
+    metadata_path: Path
+    errors: tuple[ValidationError, ...]
+    assembled: bool
+
+
+def run_packager(
+    release_dir: Path,
+    *,
+    kaggle_dir: Path = DEFAULT_KAGGLE_DIR,
+    tiers: Sequence[str] = DEFAULT_TIERS,
+    task: str = DEFAULT_TASK,
+    owner: str = DEFAULT_USER_SLUG,
+    dataset_slug: str = DEFAULT_DATASET_SLUG,
+    title: str = DEFAULT_TITLE,
+    subtitle: str = DEFAULT_SUBTITLE,
+    description: str | None = None,
+    keywords: Sequence[str] = DEFAULT_KEYWORDS,
+    license_name: str = DEFAULT_LICENSE_NAME,
+    update_frequency: str = DEFAULT_UPDATE_FREQUENCY,
+    user_sources: Sequence[UserSource] = DEFAULT_USER_SOURCES,
+    cover_image: Path = DEFAULT_COVER_IMAGE,
+    dry_run: bool = False,
+) -> PackagerOutcome:
+    """Build, validate, and write the Kaggle metadata.
+
+    With ``dry_run=False`` (the default) the packager additionally
+    assembles the Kaggle-CLI-shaped upload directory under
+    ``kaggle_dir`` (real-file copies of the per-tier bundles + cover
+    image + LICENSE + the rewritten README).  ``dry_run=True`` skips
+    the assembly step entirely — useful for fast shape iteration when
+    only the metadata content matters.
+    """
+
+    if not release_dir.exists():
+        raise FileNotFoundError(f"release directory not found: {release_dir}")
+
+    metadata = build_metadata(
+        release_dir,
+        tiers=tiers,
+        task=task,
+        owner=owner,
+        dataset_slug=dataset_slug,
+        title=title,
+        subtitle=subtitle,
+        description=description,
+        keywords=keywords,
+        license_name=license_name,
+        update_frequency=update_frequency,
+        user_sources=user_sources,
+        cover_image=cover_image,
+    )
+
+    errors: list[ValidationError] = []
+    errors.extend(validate_metadata(metadata))
+    errors.extend(validate_cover_image(cover_image))
+    errors.extend(_validate_readme_substitution(release_dir))
+
+    metadata_path = kaggle_dir / "dataset-metadata.json"
+    metadata_path.parent.mkdir(parents=True, exist_ok=True)
+    metadata_path.write_text(render_metadata_json(metadata), encoding="utf-8")
+
+    if not dry_run:
+        assemble_upload_dir(release_dir, kaggle_dir, tiers=tiers, cover_image=cover_image)
+
+    return PackagerOutcome(
+        metadata=metadata,
+        metadata_path=metadata_path,
+        errors=tuple(errors),
+        assembled=not dry_run,
+    )
+
+
+def _validate_readme_substitution(release_dir: Path) -> list[ValidationError]:
+    """Guard against silent drift between the README's tree diagram
+    and ``KAGGLE_TREE_BLOCK``.
+
+    ``_kaggle_readme_text`` substitutes the source-repo tree diagram
+    for the upload-tree diagram via plain string replace.  If the
+    README's tree changes by even one whitespace character, the
+    substitution silently no-ops and the published Kaggle dataset
+    card shows the source-repo tree (with ``intermediate_instructor/``,
+    ``notebooks/``, ``validation/``).  We catch that case here.
+    """
+
+    readme = release_dir / "README.md"
+    if not readme.exists():
+        return []  # No README is itself a release-day issue, but not this validator's concern.
+    if KAGGLE_TREE_BLOCK not in readme.read_text(encoding="utf-8"):
+        return [
+            ValidationError(
+                field="release/README.md",
+                message=(
+                    "KAGGLE_TREE_BLOCK not found verbatim in release/README.md; "
+                    "the source-repo tree diagram in the README has drifted from "
+                    "the constant in scripts/package_kaggle_release.py — the "
+                    "Kaggle description rewrite will silently no-op until the "
+                    "README and KAGGLE_TREE_BLOCK are reconciled."
+                ),
+            )
+        ]
+    return []
+
+
+# ---------------------------------------------------------------------------
+# CLI
+# ---------------------------------------------------------------------------
+
+
+def _parse_args(argv: Sequence[str] | None) -> argparse.Namespace:
+    parser = argparse.ArgumentParser(
+        description="Generate and validate Kaggle dataset-metadata.json for "
+        "leadforge-lead-scoring-v1.",
+    )
+    parser.add_argument(
+        "--release-dir",
+        type=Path,
+        default=DEFAULT_RELEASE_DIR,
+        help="release bundle root containing one subdirectory per tier (default: %(default)s)",
+    )
+    parser.add_argument(
+        "--kaggle-dir",
+        type=Path,
+        default=DEFAULT_KAGGLE_DIR,
+        help="output directory for dataset-metadata.json (default: %(default)s)",
+    )
+    parser.add_argument(
+        "--tier",
+        action="append",
+        dest="tiers",
+        default=None,
+        help="limit packaging to one tier (repeatable; default: intro/intermediate/advanced)",
+    )
+    parser.add_argument(
+        "--owner",
+        default=DEFAULT_USER_SLUG,
+        help="Kaggle owner (user or organisation) prefix on the dataset id (default: %(default)s)",
+    )
+    parser.add_argument(
+        "--dataset-slug",
+        default=DEFAULT_DATASET_SLUG,
+        help="dataset slug (must satisfy G1.2; default: %(default)s)",
+    )
+    parser.add_argument(
+        "--cover-image",
+        type=Path,
+        default=DEFAULT_COVER_IMAGE,
+        help="path to the dataset cover image (default: %(default)s)",
+    )
+    parser.add_argument(
+        "--dry-run",
+        action="store_true",
+        help="validate + write metadata only; skip assembling the upload directory",
+    )
+    return parser.parse_args(argv)
+
+
+def main(argv: Sequence[str] | None = None) -> int:
+    args = _parse_args(argv)
+    kaggle_dir: Path = args.kaggle_dir
+    tiers: tuple[str, ...] = tuple(args.tiers) if args.tiers else DEFAULT_TIERS
+
+    try:
+        outcome = run_packager(
+            args.release_dir,
+            kaggle_dir=kaggle_dir,
+            tiers=tiers,
+            owner=args.owner,
+            dataset_slug=args.dataset_slug,
+            cover_image=args.cover_image,
+            dry_run=args.dry_run,
+        )
+    except FileNotFoundError as exc:
+        print(f"error: {exc}", file=sys.stderr)
+        return 2
+    except ValueError as exc:
+        print(f"error: {exc}", file=sys.stderr)
+        return 2
+
+    if outcome.errors:
+        print("validation failed:", file=sys.stderr)
+        for err in outcome.errors:
+            print(f"  - {err.field}: {err.message}", file=sys.stderr)
+        return 1
+
+    print(f"wrote {outcome.metadata_path}", file=sys.stderr)
+    if outcome.assembled:
+        print(f"assembled upload tree under {kaggle_dir}", file=sys.stderr)
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
diff --git a/tests/scripts/test_generate_cover_image.py b/tests/scripts/test_generate_cover_image.py
new file mode 100644
index 0000000..46f3af8
--- /dev/null
+++ b/tests/scripts/test_generate_cover_image.py
@@ -0,0 +1,109 @@
+"""Tests for ``scripts/generate_cover_image.py``.
+
+Locks the two acceptance properties for the Kaggle cover image:
+
+1. it satisfies G11.2 — at least 560 × 280 pixels in the right modes;
+2. the output is byte-deterministic across runs and matches the
+   committed PNG (audit-artifact-sync pattern from PR 4.1).
+
+If the simulator's headline metrics drift the cover image's pinned
+literals out of date, both the determinism check here and the metrics
+in ``release/validation/validation_report.md`` will need a coordinated
+update.
+"""
+
+from __future__ import annotations
+
+import importlib.util
+import sys
+from pathlib import Path
+
+import pytest
+from PIL import Image
+
+_SCRIPT_PATH = Path(__file__).resolve().parents[2] / "scripts" / "generate_cover_image.py"
+_REPO_ROOT = Path(__file__).resolve().parents[2]
+_spec = importlib.util.spec_from_file_location("generate_cover_image", _SCRIPT_PATH)
+assert _spec is not None
+assert _spec.loader is not None
+generator = importlib.util.module_from_spec(_spec)
+sys.modules["generate_cover_image"] = generator
+_spec.loader.exec_module(generator)
+
+
+_COMMITTED_COVER = _REPO_ROOT / "release" / "dataset-cover-image.png"
+_COMMITTED_PRESENT = _COMMITTED_COVER.exists()
+
+
+# ---------------------------------------------------------------------------
+# Dimension floor + mode (G11.2)
+# ---------------------------------------------------------------------------
+
+
+def test_render_cover_dimensions_above_kaggle_minimum() -> None:
+    """G11.2: cover image must be at least 560 × 280; we ship 1280 × 640."""
+
+    image = generator.render_cover()
+    assert image.size == (generator.CANVAS_WIDTH, generator.CANVAS_HEIGHT)
+    assert image.size[0] >= 560
+    assert image.size[1] >= 280
+    # Ratio check — we deliberately render at 2:1 so the Kaggle header
+    # crop matches the source aspect ratio.
+    assert image.size[0] == 2 * image.size[1]
+    assert image.mode == "RGB"
+
+
+def test_write_cover_writes_png_at_target_size(tmp_path: Path) -> None:
+    """``write_cover`` round-trips through Pillow at the declared dimensions."""
+
+    out = tmp_path / "cover.png"
+    generator.write_cover(out)
+
+    with Image.open(out) as img:
+        assert img.format == "PNG"
+        assert img.size == (generator.CANVAS_WIDTH, generator.CANVAS_HEIGHT)
+
+
+# ---------------------------------------------------------------------------
+# Determinism + sync with committed asset
+# ---------------------------------------------------------------------------
+
+
+def test_render_cover_is_byte_deterministic(tmp_path: Path) -> None:
+    """Two back-to-back ``write_cover`` calls on the same machine
+    produce byte-identical PNGs.
+
+    Pillow's PNG writer is deterministic given the same encoder
+    settings + the same FreeType-rasterised glyph bitmaps.  This
+    guard catches regressions in the rasterisation pipeline locally;
+    cross-platform byte equality is *not* guaranteed (FreeType
+    versions and font-hinting tables differ between macOS and Linux,
+    so the committed PNG may not match a fresh render produced on a
+    different OS — we deliberately do not assert that here).
+    """
+
+    a = tmp_path / "cover_a.png"
+    b = tmp_path / "cover_b.png"
+    generator.write_cover(a)
+    generator.write_cover(b)
+    assert a.read_bytes() == b.read_bytes()
+
+
+@pytest.mark.skipif(not _COMMITTED_PRESENT, reason="committed cover image not present")
+def test_committed_cover_meets_kaggle_dimensions(tmp_path: Path) -> None:
+    """The committed ``release/dataset-cover-image.png`` opens cleanly
+    and meets Kaggle's dimension floor (G11.2).
+
+    The committed PNG is a *valid render*, not a hash-locked artefact —
+    it ships so a fresh clone has a usable cover image without first
+    running ``scripts/generate_cover_image.py``.  Cross-OS byte
+    equality is not asserted (see
+    :func:`test_render_cover_is_byte_deterministic`).
+    """
+
+    with Image.open(_COMMITTED_COVER) as img:
+        assert img.format == "PNG"
+        assert img.size[0] >= 560
+        assert img.size[1] >= 280
+        # Same shape as ``render_cover`` produces.
+        assert img.size == (generator.CANVAS_WIDTH, generator.CANVAS_HEIGHT)
diff --git a/tests/scripts/test_package_kaggle_release.py b/tests/scripts/test_package_kaggle_release.py
new file mode 100644
index 0000000..276d01e
--- /dev/null
+++ b/tests/scripts/test_package_kaggle_release.py
@@ -0,0 +1,536 @@
+"""Tests for ``scripts/package_kaggle_release.py``.
+
+Locks the Phase 5 Kaggle packaging contract:
+
+* every Kaggle field constraint surfaced in chatgpt v2 §19 (G11.1)
+* the cover-image dimension floor (G11.2)
+* the README link-rewriting that lets the published dataset card on
+  Kaggle keep working ``../`` links (rewritten to GitHub blob URLs)
+  and a directory diagram that reflects the upload layout, plus a
+  guard that the source ``KAGGLE_TREE_BLOCK`` is still present
+  verbatim in the README (silent-failure trap)
+* the assembled upload tree resolves every declared resource path
+  (so ``kaggle datasets create`` can find each file)
+* the safety net that refuses to assemble into ``cwd`` /
+  ``release_dir`` / its parent
+* byte-equality + content-shape between the committed
+  ``release/kaggle/dataset-metadata.json`` and a fresh regeneration
+  (audit-artifact-sync pattern from PR 4.1)
+"""
+
+from __future__ import annotations
+
+import importlib.util
+import json
+import sys
+from pathlib import Path
+
+import pytest
+from PIL import Image
+
+_SCRIPT_PATH = Path(__file__).resolve().parents[2] / "scripts" / "package_kaggle_release.py"
+_REPO_ROOT = Path(__file__).resolve().parents[2]
+_spec = importlib.util.spec_from_file_location("package_kaggle_release", _SCRIPT_PATH)
+assert _spec is not None
+assert _spec.loader is not None
+packager = importlib.util.module_from_spec(_spec)
+sys.modules["package_kaggle_release"] = packager
+_spec.loader.exec_module(packager)
+
+
+_RELEASE_DIR = _REPO_ROOT / "release"
+_RELEASE_BUNDLES_PRESENT = (_RELEASE_DIR / "intro" / "manifest.json").exists()
+_COMMITTED_METADATA = _REPO_ROOT / "release" / "kaggle" / "dataset-metadata.json"
+_COMMITTED_COVER = _REPO_ROOT / "release" / "dataset-cover-image.png"
+
+
+# ---------------------------------------------------------------------------
+# Fixtures
+# ---------------------------------------------------------------------------
+
+
+def _minimal_metadata() -> packager.DatasetMetadata:
+    """A minimum-viable ``DatasetMetadata`` that should validate cleanly."""
+
+    return packager.DatasetMetadata(
+        title=packager.DEFAULT_TITLE,
+        id=f"{packager.DEFAULT_USER_SLUG}/{packager.DEFAULT_DATASET_SLUG}",
+        subtitle=packager.DEFAULT_SUBTITLE,
+        description="Synthetic CRM lead-scoring dataset.",
+        isPrivate=True,
+        licenses=(packager.LicenseSpec(name=packager.DEFAULT_LICENSE_NAME),),
+        keywords=packager.DEFAULT_KEYWORDS,
+        collaborators=(),
+        expectedUpdateFrequency=packager.DEFAULT_UPDATE_FREQUENCY,
+        userSpecifiedSources=packager.DEFAULT_USER_SOURCES,
+        image="dataset-cover-image.png",
+        resources=(
+            packager.Resource(
+                path="intro/lead_scoring.csv",
+                description="Intro flat CSV.",
+                schema=packager.ResourceSchema(
+                    fields=(
+                        packager.FieldDescriptor(name="lead_id", type="string", description="ID."),
+                    )
+                ),
+            ),
+        ),
+    )
+
+
+def _make_valid_cover(path: Path) -> None:
+    """Write a minimum-Kaggle-acceptable cover image at ``path``."""
+
+    Image.new("RGB", (1280, 640), (0, 0, 0)).save(path)
+
+
+# ---------------------------------------------------------------------------
+# Field-constraint validation (G11.1)
+# ---------------------------------------------------------------------------
+
+
+def test_validate_metadata_accepts_canonical_v1_metadata() -> None:
+    assert packager.validate_metadata(_minimal_metadata()) == []
+
+
+def test_validate_metadata_reports_every_constraint_violation() -> None:
+    """One bad metadata payload triggers every field check at once."""
+
+    bad = packager.DatasetMetadata(
+        title="Tiny",  # < 6 chars
+        id="LeadForge Bad Slug!",  # missing '/' + invalid chars
+        subtitle="short",  # < 20 chars
+        description="x",
+        isPrivate=True,
+        licenses=(  # two entries, must be exactly one
+            packager.LicenseSpec(name="MIT"),
+            packager.LicenseSpec(name="Apache-2.0"),
+        ),
+        keywords=("synthetic-data",),
+        collaborators=(),
+        expectedUpdateFrequency="sometimes",  # not approved
+        userSpecifiedSources=(),
+        image="cover.bmp",  # disallowed extension
+        resources=(),  # empty resource list
+    )
+
+    errors = packager.validate_metadata(bad)
+    fields = {e.field for e in errors}
+    assert "title" in fields
+    assert "subtitle" in fields
+    assert "id" in fields
+    assert "licenses" in fields
+    assert "expectedUpdateFrequency" in fields
+    assert "image" in fields
+    assert "resources" in fields
+
+
+def test_validate_id_requires_user_slash_slug_format() -> None:
+    """Slug-only ids are rejected — Kaggle's schema is ``user/slug``.
+
+    Mirrors the design call recorded in the PR write-up: PR 7.2's
+    publish script should not have to splice in a username at upload
+    time.
+    """
+
+    slug_only = packager._validate_id("leadforge-lead-scoring-v1")
+    assert any(e.field == "id" and "missing 'user/'" in e.message for e in slug_only)
+
+    well_formed = packager._validate_id("leadforge/leadforge-lead-scoring-v1")
+    assert well_formed == []
+
+    invalid_slug = packager._validate_id("leadforge/Bad Slug!")
+    assert any(e.field == "id (slug)" for e in invalid_slug)
+
+
+def test_validate_metadata_flags_schema_fields_without_name_or_type() -> None:
+    """Schema fields must declare both name and type to satisfy G11.1."""
+
+    bad = _minimal_metadata()
+    broken = packager.Resource(
+        path="x.csv",
+        description="x",
+        schema=packager.ResourceSchema(
+            fields=(packager.FieldDescriptor(name="", type="string"),),
+        ),
+    )
+    bad = packager.DatasetMetadata(**{**bad.__dict__, "resources": (broken,)})
+    errors = packager.validate_metadata(bad)
+    assert any("name and type" in e.message for e in errors)
+
+
+# ---------------------------------------------------------------------------
+# Cover image (G11.2)
+# ---------------------------------------------------------------------------
+
+
+@pytest.mark.skipif(not _COMMITTED_COVER.exists(), reason="committed cover image not present")
+def test_validate_cover_image_passes_for_committed_asset() -> None:
+    assert packager.validate_cover_image(_COMMITTED_COVER) == []
+
+
+def test_validate_cover_image_rejects_too_small_image(tmp_path: Path) -> None:
+    tiny = tmp_path / "tiny.png"
+    Image.new("RGB", (100, 50), (0, 0, 0)).save(tiny)
+    errors = packager.validate_cover_image(tiny)
+    assert errors
+    assert errors[0].field == "cover_image"
+    assert "below Kaggle minimum" in errors[0].message
+
+
+def test_validate_cover_image_reports_missing_file(tmp_path: Path) -> None:
+    errors = packager.validate_cover_image(tmp_path / "no-such.png")
+    assert errors
+    assert errors[0].field == "cover_image"
+
+
+# ---------------------------------------------------------------------------
+# Schema fields — derive-from-source contract
+#
+# The flat-CSV schema is built by iterating the CSV header, so column-
+# order parity with the CSV is a construction-time invariant.  The
+# parquet schema comes straight from ``pq.read_schema``, same story.
+# Re-checking either via a separate validator is tautological — the
+# real coverage is the audit-artifact-sync test below
+# (``test_committed_kaggle_metadata_matches_fresh_regeneration``),
+# which fails the moment any tier's CSV header or parquet schema
+# drifts without a matching metadata regeneration.
+# ---------------------------------------------------------------------------
+
+
+# ---------------------------------------------------------------------------
+# README rewriting + description content
+# ---------------------------------------------------------------------------
+
+
+@pytest.mark.skipif(not _RELEASE_BUNDLES_PRESENT, reason="release bundles not present")
+def test_kaggle_readme_text_rewrites_links_and_tree_diagram() -> None:
+    readme = (_RELEASE_DIR / "README.md").read_text(encoding="utf-8")
+    rewritten = packager._kaggle_readme_text(readme)
+
+    # Source-repo tree → upload tree.
+    assert "intermediate_instructor/" not in rewritten
+    assert "notebooks/01_baseline_lead_scoring.ipynb" not in rewritten
+    assert "dataset-metadata.json             # Kaggle" in rewritten
+
+    # Relative ../ links rewritten to GitHub blob URLs.
+    assert "](../" not in rewritten
+    assert packager.GITHUB_BLOB_BASE in rewritten
+
+    # The validation-report link (which lives under release/, not under
+    # the upload dir) must point at GitHub.
+    assert "](validation/validation_report.md)" not in rewritten
+    assert f"]({packager.GITHUB_BLOB_BASE}/release/validation/validation_report.md)" in rewritten
+
+
+@pytest.mark.skipif(not _RELEASE_BUNDLES_PRESENT, reason="release bundles not present")
+def test_kaggle_tree_block_is_present_in_release_readme() -> None:
+    """Silent-failure guard.
+
+    ``_kaggle_readme_text`` substitutes ``KAGGLE_TREE_BLOCK`` →
+    ``KAGGLE_UPLOAD_TREE_BLOCK`` via plain string replace.  If anyone
+    tweaks the README's tree diagram by even one whitespace
+    character, the substitution silently no-ops and the published
+    Kaggle dataset card carries the source-repo tree.  This guard
+    fires loudly the moment the constants drift apart.
+    """
+
+    readme = (_RELEASE_DIR / "README.md").read_text(encoding="utf-8")
+    assert packager.KAGGLE_TREE_BLOCK in readme, (
+        "scripts/package_kaggle_release.py KAGGLE_TREE_BLOCK no longer matches "
+        "the tree diagram in release/README.md — reconcile the two before "
+        "the next release-metadata regeneration."
+    )
+
+
+@pytest.mark.skipif(not _RELEASE_BUNDLES_PRESENT, reason="release bundles not present")
+def test_validate_readme_substitution_flags_drift(tmp_path: Path) -> None:
+    """``_validate_readme_substitution`` is wired into the run-time
+    validator, not just the static guard above."""
+
+    fake_release = tmp_path / "release"
+    fake_release.mkdir()
+    (fake_release / "README.md").write_text("# Some unrelated README\n", encoding="utf-8")
+    errors = packager._validate_readme_substitution(fake_release)
+    assert errors
+    assert errors[0].field == "release/README.md"
+    assert "KAGGLE_TREE_BLOCK" in errors[0].message
+
+    # Sanity: the real release README does NOT trigger the validator.
+    assert packager._validate_readme_substitution(_RELEASE_DIR) == []
+
+
+@pytest.mark.skipif(not _RELEASE_BUNDLES_PRESENT, reason="release bundles not present")
+def test_assembled_upload_dir_writes_rewritten_readme_copy(tmp_path: Path) -> None:
+    """The README inside the upload tree is a real file with the
+    rewrites — Kaggle reads this verbatim on the dataset page."""
+
+    kaggle_dir = tmp_path / "kaggle"
+    cover_image = tmp_path / "cover.png"
+    _make_valid_cover(cover_image)
+    packager.run_packager(_RELEASE_DIR, kaggle_dir=kaggle_dir, cover_image=cover_image)
+
+    kaggle_readme = kaggle_dir / "README.md"
+    assert kaggle_readme.is_file()
+    assert not kaggle_readme.is_symlink()
+    contents = kaggle_readme.read_text(encoding="utf-8")
+    assert "](../" not in contents
+    assert packager.GITHUB_BLOB_BASE in contents
+
+
+@pytest.mark.skipif(not _RELEASE_BUNDLES_PRESENT, reason="release bundles not present")
+def test_assembled_upload_dir_resolves_every_declared_resource(tmp_path: Path) -> None:
+    """Every ``resources[].path`` declared in the metadata must resolve
+    to a real file (not a symlink, not a missing path) under the
+    assembled upload directory.  Kaggle's CLI walks the directory at
+    upload time; a declared resource that doesn't materialise is a
+    silent upload-time failure.
+    """
+
+    kaggle_dir = tmp_path / "kaggle"
+    cover_image = tmp_path / "cover.png"
+    _make_valid_cover(cover_image)
+    outcome = packager.run_packager(_RELEASE_DIR, kaggle_dir=kaggle_dir, cover_image=cover_image)
+
+    # Every resource path resolves to a real file.
+    for resource in outcome.metadata.resources:
+        target = kaggle_dir / resource.path
+        assert target.is_file(), f"declared resource missing from upload tree: {resource.path}"
+        assert not target.is_symlink(), (
+            f"declared resource is a symlink, not a real file: {resource.path} — "
+            f"Kaggle's CLI may skip symlinked entries on upload"
+        )
+
+    # Top-level required artefacts.
+    assert (kaggle_dir / "dataset-metadata.json").is_file()
+    assert (kaggle_dir / "README.md").is_file()
+    assert (kaggle_dir / cover_image.name).is_file()
+    assert not (kaggle_dir / cover_image.name).is_symlink()
+
+
+# ---------------------------------------------------------------------------
+# Upload-dir assembly safety
+# ---------------------------------------------------------------------------
+
+
+def test_assemble_upload_dir_rejects_unsafe_kaggle_dir(tmp_path: Path) -> None:
+    """Refuse to assemble into the release dir or its parent."""
+
+    fake_release = tmp_path / "release"
+    fake_release.mkdir()
+    with pytest.raises(ValueError, match="unsafe"):
+        packager.assemble_upload_dir(fake_release, fake_release)
+    with pytest.raises(ValueError, match="unsafe"):
+        packager.assemble_upload_dir(fake_release, fake_release.parent)
+
+
+def test_assemble_upload_dir_rejects_kaggle_dir_equal_to_cwd(
+    tmp_path: Path, monkeypatch: pytest.MonkeyPatch
+) -> None:
+    """Refuse to assemble into the current working directory.
+
+    A user passing ``--kaggle-dir .`` (or running from inside the
+    intended ``kaggle_dir``) would otherwise rmtree-then-recopy
+    arbitrary cwd contents.  This is the most-likely-to-trigger
+    safety case and was missing test coverage in the initial PR.
+    """
+
+    fake_release = tmp_path / "release"
+    fake_release.mkdir()
+    cwd = tmp_path / "workdir"
+    cwd.mkdir()
+    monkeypatch.chdir(cwd)
+    with pytest.raises(ValueError, match="unsafe"):
+        packager.assemble_upload_dir(fake_release, cwd)
+
+
+def test_assemble_upload_dir_idempotent_against_existing_tree(tmp_path: Path) -> None:
+    """Re-running the assembly over an already-populated upload tree
+    succeeds — the previous PR's symlink-vs-file confusion is no
+    longer possible because both passes call the same copy helpers."""
+
+    if not _RELEASE_BUNDLES_PRESENT:
+        pytest.skip("release bundles not present")
+
+    kaggle_dir = tmp_path / "kaggle"
+    cover_image = tmp_path / "cover.png"
+    _make_valid_cover(cover_image)
+    packager.run_packager(_RELEASE_DIR, kaggle_dir=kaggle_dir, cover_image=cover_image)
+    # Second pass against the same kaggle_dir.
+    outcome = packager.run_packager(_RELEASE_DIR, kaggle_dir=kaggle_dir, cover_image=cover_image)
+    assert outcome.errors == ()
+    for resource in outcome.metadata.resources:
+        assert (kaggle_dir / resource.path).is_file()
+
+
+# ---------------------------------------------------------------------------
+# CLI driver — error paths
+# ---------------------------------------------------------------------------
+
+
+def test_main_reports_missing_release_dir(
+    tmp_path: Path, capsys: pytest.CaptureFixture[str]
+) -> None:
+    rc = packager.main(
+        [
+            "--release-dir",
+            str(tmp_path / "missing"),
+            "--kaggle-dir",
+            str(tmp_path / "kaggle"),
+            "--cover-image",
+            str(tmp_path / "cover.png"),
+            "--dry-run",
+        ]
+    )
+    captured = capsys.readouterr()
+    assert rc == 2
+    assert "release directory not found" in captured.err
+
+
+# ---------------------------------------------------------------------------
+# Determinism + sync with committed artefact
+# ---------------------------------------------------------------------------
+
+
+@pytest.mark.skipif(not _RELEASE_BUNDLES_PRESENT, reason="release bundles not present")
+def test_run_packager_metadata_is_byte_deterministic(tmp_path: Path) -> None:
+    """Two back-to-back runs against the committed bundles must
+    produce byte-identical metadata files."""
+
+    cover = tmp_path / "cover.png"
+    _make_valid_cover(cover)
+
+    out_a = tmp_path / "a"
+    out_b = tmp_path / "b"
+    packager.run_packager(_RELEASE_DIR, kaggle_dir=out_a, cover_image=cover, dry_run=True)
+    packager.run_packager(_RELEASE_DIR, kaggle_dir=out_b, cover_image=cover, dry_run=True)
+    assert (out_a / "dataset-metadata.json").read_bytes() == (
+        out_b / "dataset-metadata.json"
+    ).read_bytes()
+
+
+def test_render_metadata_emits_literal_unicode_not_escapes() -> None:
+    """``ensure_ascii=False`` keeps em-dashes, ``×``, smart quotes etc.
+    rendered literally so the committed JSON stays diffable."""
+
+    metadata = _minimal_metadata()
+    rendered = packager.render_metadata_json(
+        packager.DatasetMetadata(**{**metadata.__dict__, "description": "a — b × c"})
+    )
+    assert "a — b × c" in rendered
+    assert "\\u2014" not in rendered
+    assert "\\u00d7" not in rendered
+
+
+def test_render_metadata_keywords_are_sorted_at_render_time() -> None:
+    """Keywords are sorted in the rendered JSON regardless of the
+    order they were declared on the metadata object — locks the
+    determinism contract independent of the ``DEFAULT_KEYWORDS``
+    constant ordering."""
+
+    base = _minimal_metadata()
+    shuffled = packager.DatasetMetadata(
+        **{**base.__dict__, "keywords": ("zebra", "alpha", "mango")},
+    )
+    parsed = json.loads(packager.render_metadata_json(shuffled))
+    assert parsed["keywords"] == ["alpha", "mango", "zebra"]
+
+
+# ---------------------------------------------------------------------------
+# Kaggle CLI shape validation (G11.3) — gated, opt-in
+# ---------------------------------------------------------------------------
+
+
+@pytest.mark.skipif(not _RELEASE_BUNDLES_PRESENT, reason="release bundles not present")
+def test_kaggle_cli_accepts_assembled_metadata(tmp_path: Path) -> None:
+    """G11.3 — feed the assembled tree to the actual Kaggle metadata
+    validator and assert it accepts the shape.
+
+    Skipped unless the optional ``kaggle`` package is installed
+    (``pip install -e '.[publish]'``); we deliberately don't make
+    that a hard dependency because the kaggle SDK pulls in a long
+    transitive tail.  The Kaggle SDK exposes a metadata validator
+    via ``kaggle.api.validate_dataset_metadata`` (path varies by
+    version); we look it up dynamically and skip if absent rather
+    than hard-couple to one CLI version.
+    """
+
+    kaggle = pytest.importorskip("kaggle", reason="kaggle SDK not installed")
+    kaggle_dir = tmp_path / "kaggle"
+    cover = tmp_path / "cover.png"
+    _make_valid_cover(cover)
+    packager.run_packager(_RELEASE_DIR, kaggle_dir=kaggle_dir, cover_image=cover)
+
+    # Search for a metadata-validator entry point on the kaggle API.
+    api = kaggle.api
+    candidates = [
+        getattr(api, name, None)
+        for name in (
+            "validate_dataset_metadata",
+            "_validate_dataset_metadata",
+            "process_resources",
+        )
+    ]
+    validator = next((c for c in candidates if callable(c)), None)
+    if validator is None:
+        pytest.skip("no Kaggle metadata-validator entry point found on the installed SDK")
+
+    # Different Kaggle SDK versions expose different signatures; try
+    # the most common shapes.  We're treating "no exception raised"
+    # as acceptance.
+    try:
+        validator(str(kaggle_dir))
+    except TypeError:
+        validator(str(kaggle_dir / "dataset-metadata.json"))
+
+
+@pytest.mark.skipif(
+    not (_RELEASE_BUNDLES_PRESENT and _COMMITTED_METADATA.exists()),
+    reason="release bundles or committed metadata missing",
+)
+def test_committed_kaggle_metadata_matches_fresh_regeneration(tmp_path: Path) -> None:
+    """A fresh metadata regeneration must match the committed
+    ``release/kaggle/dataset-metadata.json`` byte-for-byte AND have
+    a non-degenerate description / id / image.
+
+    If this fails, ``release/`` drifted without re-running
+    ``scripts/package_kaggle_release.py``.  Regenerate via that
+    script from the repo root and commit the new metadata alongside
+    the bundle change.
+    """
+
+    cover = _COMMITTED_COVER if _COMMITTED_COVER.exists() else tmp_path / "cover.png"
+    if not _COMMITTED_COVER.exists():
+        _make_valid_cover(cover)
+
+    fresh_dir = tmp_path / "kaggle"
+    packager.run_packager(_RELEASE_DIR, kaggle_dir=fresh_dir, cover_image=cover, dry_run=True)
+    fresh_bytes = (fresh_dir / "dataset-metadata.json").read_bytes()
+    committed_bytes = _COMMITTED_METADATA.read_bytes()
+    assert fresh_bytes == committed_bytes
+
+    # Positive content assertions — guard against the failure mode
+    # where a code change accidentally produces empty / minimal
+    # content that we then re-commit, leaving the byte-equality
+    # check passing on broken output.
+    parsed = json.loads(fresh_bytes)
+    assert parsed["id"] == f"{packager.DEFAULT_USER_SLUG}/{packager.DEFAULT_DATASET_SLUG}"
+    assert parsed["image"] == "dataset-cover-image.png"
+    description = parsed["description"]
+    # The description should carry the rewritten dataset card, not be
+    # empty or stub content.
+    assert "What's inside" in description
+    assert "Why lead scoring matters" in description
+    assert "Known limitations" in description
+    # Rewrites fired (no source-tree leaks, no broken relative links).
+    assert "intermediate_instructor/" not in description
+    assert "](../" not in description
+    assert "github.com/leadforge-dev/leadforge/blob/main" in description
+    # Resources are non-trivial.
+    assert len(parsed["resources"]) >= 30
+    # Every flat CSV has a schema with the canonical 33-column shape.
+    flat_csvs = [r for r in parsed["resources"] if r["path"].endswith("/lead_scoring.csv")]
+    assert len(flat_csvs) == len(packager.DEFAULT_TIERS)
+    for r in flat_csvs:
+        assert r["schema"]["fields"][0]["name"] == "split"
+        assert r["schema"]["fields"][-1]["name"] == "converted_within_90_days"