feat(research): measurement-jobs write resource + retryable backfill drain (chat#1796)#668
Conversation
…drain Implements the chat#1796 api consolidation (write side) against the docs contract (recoupable/docs#242), TDD throughout. measurement-jobs (the ingest resource that replaces snapshots + the never-built backfill verb, and unblocks the catalog-value-estimator seed in skills#43): - POST /api/research/measurement-jobs {scope, source} - source=current -> reuses the snapshot pipeline (maps snapshot_id -> id) - source=historical -> resolves scope to ISRCs, enqueues Songstats backfill ranked by latest count, skips songs already carrying songstats history - GET /api/research/measurement-jobs/{id} -> poll a current job (uncharged) retryable backfill drain (fixes the starvation root cause's robustness gap): - backfillTrackStep: 404 -> done (terminal no-data); 429/5xx/timeout -> failed (transient, reclaimable) instead of permanently stranding tracks - reclaimStaleBackfillRows: resets failed + orphaned in_progress rows to pending - playcount-maintenance cron reclaims before each drain (auto-recovers the 2 stranded rows) 29 new/updated unit tests; full research+workflows suite green (291); tsc clean on touched files; lint clean.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Warning Review limit reached
More reviews will be available in 34 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (5)
📒 Files selected for processing (6)
📝 WalkthroughWalkthroughAdds a ChangesMeasurement Jobs API
Backfill Queue Maintenance
Sequence Diagram(s)sequenceDiagram
participant Client
participant POSTRoute as POST /measurement-jobs
participant createMeasurementJobHandler
participant validateCreateMeasurementJobRequest
participant createMeasurementJob
participant enqueueHistoricalBackfill
participant createSnapshot
Client->>POSTRoute: POST { scope, source, platforms }
POSTRoute->>createMeasurementJobHandler: request
createMeasurementJobHandler->>validateCreateMeasurementJobRequest: request
validateCreateMeasurementJobRequest-->>createMeasurementJobHandler: { accountId, body } or 400 NextResponse
createMeasurementJobHandler->>createMeasurementJob: ValidatedCreateMeasurementJobRequest
alt source = "historical"
createMeasurementJob->>enqueueHistoricalBackfill: scope
enqueueHistoricalBackfill-->>createMeasurementJob: { data: { enqueued, skipped } }
else source = "current"
createMeasurementJob->>createSnapshot: { scope, platforms, schedule: "once" }
createSnapshot-->>createMeasurementJob: { snapshot_id, state, estimated_cost_usd }
end
createMeasurementJob-->>createMeasurementJobHandler: CreateMeasurementJobResult
createMeasurementJobHandler-->>Client: 202 { id, source, state, ... }
sequenceDiagram
participant Client
participant GETRoute as GET /measurement-jobs/:id
participant getMeasurementJobHandler
participant getMeasurementJob
participant selectPlaycountSnapshots
Client->>GETRoute: GET /measurement-jobs/{id}
GETRoute->>getMeasurementJobHandler: request, id
getMeasurementJobHandler->>getMeasurementJob: { id }
getMeasurementJob->>selectPlaycountSnapshots: id
selectPlaycountSnapshots-->>getMeasurementJob: snapshot row or null
getMeasurementJob-->>getMeasurementJobHandler: { data } or { error, status: 404 }
getMeasurementJobHandler-->>Client: 200 { id, state, estimated_cost_usd } or 404
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (7)
lib/supabase/songstats_backfill_queue/reclaimStaleBackfillRows.ts (2)
1-1: ⚡ Quick winUse the canonical Supabase client import path.
This file should import
serverClientvia the repo alias to stay consistent with Supabase module conventions.Suggested fix
-import supabase from "../serverClient"; +import supabase from "`@/lib/supabase/serverClient`";As per coding guidelines, “Supabase database functions should import from
@/lib/supabase/serverClient.”🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/supabase/songstats_backfill_queue/reclaimStaleBackfillRows.ts` at line 1, The import statement at the top of reclaimStaleBackfillRows.ts uses a relative path to import serverClient instead of the canonical repository alias. Change the import statement from the relative path "../serverClient" to use the alias "`@/lib/supabase/serverClient`" to maintain consistency with Supabase module import conventions across the codebase.Source: Coding guidelines
15-15: ⚡ Quick winRename this Supabase operation to the required verb-style prefix.
reclaimStaleBackfillRowsdoes not follow the requiredselect*/insert*/update*/delete*/get*naming pattern forlib/supabase/**.As per coding guidelines, “For Supabase operations, ensure: Follow naming convention: select*, insert*, update*, delete*, get* (for complex queries).”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/supabase/songstats_backfill_queue/reclaimStaleBackfillRows.ts` at line 15, The function reclaimStaleBackfillRows does not follow the required Supabase operation naming convention. Rename this function to use one of the required verb-style prefixes: select*, insert*, update*, delete*, or get*. Choose the appropriate prefix based on the operation's actual behavior (for example, if it updates rows to mark them as reclaimed, use update*; if it retrieves rows for processing, use select* or get*). After renaming the exported function definition, update all imports and call sites throughout the codebase to use the new name.Source: Coding guidelines
lib/research/playcounts/playcountMaintenanceHandler.ts (1)
17-44: ⚡ Quick winRefactor this handler to satisfy the function-size guideline.
The handler is over the 20-line threshold and bundles multiple responsibilities (auth gate, reclaim, orchestration, response mapping). Extracting orchestration/response helpers will keep this endpoint easier to maintain.
As per coding guidelines, “Flag functions longer than 20 lines” and “Keep functions small and focused.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/research/playcounts/playcountMaintenanceHandler.ts` around lines 17 - 44, The playcountMaintenanceHandler function exceeds the 20-line guideline and combines multiple responsibilities: authorization validation, orchestration of three async operations, and response mapping. Extract the orchestration logic (the three await calls for reclaimStaleBackfillRows, start, and startDueMonthlySnapshots) and response mapping into a separate helper function. Keep playcountMaintenanceHandler focused on handling the request/response flow and delegating the core business logic to this new helper. This will reduce the main handler to well under 20 lines while maintaining clarity of intent.Source: Coding guidelines
app/workflows/backfillTrackStep.ts (1)
19-68: 🏗️ Heavy liftSplit this workflow step into smaller focused units.
backfillTrackStepexceeds the 20-line function cap and currently mixes fetch, status classification, transform, upsert, ledgering, and queue-state transitions in one method. Extracting these responsibilities will make failure modes easier to reason about and test.As per coding guidelines, “Flag functions longer than 20 lines” and “Keep functions small and focused.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/workflows/backfillTrackStep.ts` around lines 19 - 68, The backfillTrackStep function exceeds the recommended function length and combines multiple distinct responsibilities: HTTP fetching, status classification and error handling, response parsing, data transformation, upsert operations, quota ledgering, and queue-state management. Extract each responsibility into its own focused function - for example, a separate function to handle non-200 response cases (that classifies the error and records the ledger entry), another function to parse and transform the historic stats payload into measurement rows, and potentially another to perform the upsert and queue update. This will reduce backfillTrackStep to a simple orchestrator that calls these smaller focused units in sequence, making each piece independently testable and easier to reason about according to the coding guidelines for keeping functions small and focused.Source: Coding guidelines
lib/research/measurement_jobs/createMeasurementJob.ts (1)
21-47: ⚡ Quick winSplit dispatch/mapping to keep
createMeasurementJobfocused.This function currently exceeds the 20-line threshold and mixes branching with response-shaping logic; extracting the
currentmapping into a helper keeps the function tighter and easier to evolve.As per coding guidelines, "Flag functions longer than 20 lines or classes with >200 lines" and "Keep functions small and focused."
♻️ Suggested refactor
type SnapshotData = { snapshot_id: string; state: string; album_count: number; estimated_cost_usd: number; }; + +function mapSnapshotToMeasurementJob(data: SnapshotData) { + return { + status: "success", + source: "current" as const, + id: data.snapshot_id, + state: data.state, + album_count: data.album_count, + estimated_cost_usd: data.estimated_cost_usd, + }; +} @@ export async function createMeasurementJob( req: ValidatedCreateMeasurementJobRequest, ): Promise<CreateMeasurementJobResult> { const { accountId, body } = req; if (body.source === "historical") { return enqueueHistoricalBackfill(body.scope); } const result = await createSnapshot({ accountId, body: { ...body.scope, platforms: body.platforms, schedule: "once" }, }); if ("error" in result) return result; const d = result.data as SnapshotData; - return { - data: { - status: "success", - source: "current", - id: d.snapshot_id, - state: d.state, - album_count: d.album_count, - estimated_cost_usd: d.estimated_cost_usd, - }, - }; + return { data: mapSnapshotToMeasurementJob(d) }; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/research/measurement_jobs/createMeasurementJob.ts` around lines 21 - 47, The createMeasurementJob function exceeds 20 lines and mixes branching logic with response-shaping concerns. Extract the response mapping logic that transforms the SnapshotData (stored in variable d) into the CreateMeasurementJobResult format into a separate helper function. This helper should accept the SnapshotData and return the mapped data object with status, source, id, state, album_count, and estimated_cost_usd fields. Then call this helper function from createMeasurementJob to keep the main function focused and under 20 lines.Source: Coding guidelines
lib/research/measurement_jobs/enqueueHistoricalBackfill.ts (1)
20-58: ⚡ Quick winRefactor
enqueueHistoricalBackfillinto smaller stages.Line 20 to Line 58 currently mixes scope resolution, validation, measurement classification, queue mutation, and response shaping. Breaking this into focused helpers will improve readability/testability and reduce maintenance risk.
As per coding guidelines,
**/*.{js,ts,tsx,jsx,py,java,cs,go,rb,php}requires flagging functions longer than 20 lines and keeping functions small and focused.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/research/measurement_jobs/enqueueHistoricalBackfill.ts` around lines 20 - 58, The function enqueueHistoricalBackfill currently handles multiple distinct concerns (resolving scope, validating results, classifying measurements, mutating queue, and formatting response) in a single function that exceeds the 20-line guideline. Extract each logical stage into separate focused helper functions such as one for resolving and validating scope songs, one for classifying measurements into backfilled and pending categories, and one for enqueueing items into the queue, then have enqueueHistoricalBackfill orchestrate these helpers and return the final result. This will make each function small, testable, and easier to maintain.Source: Coding guidelines
lib/research/measurement_jobs/resolveScopeSongs.ts (1)
15-35: ⚡ Quick winSplit
resolveScopeSongsinto smaller scope-specific helpers.Line 15 to Line 35 combines three resolution paths in one function and exceeds the project’s max function-length rule. Extract the
isrcs,album_ids, andcatalog_idbranches into internal helpers to keep the main function as orchestration only.As per coding guidelines,
**/*.{js,ts,tsx,jsx,py,java,cs,go,rb,php}requires flagging functions longer than 20 lines and keeping functions small and focused.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/research/measurement_jobs/resolveScopeSongs.ts` around lines 15 - 35, The resolveScopeSongs function exceeds the 20-line limit by combining three distinct resolution paths. Extract each branch into separate internal helper functions: one for the isrcs resolution (the scope.isrcs condition), one for the album_ids resolution (the scope.album_ids condition), and one for the catalog_id resolution (the scope.catalog_id condition). Keep the main resolveScopeSongs function as a simple orchestrator that conditionally calls the appropriate helper based on which scope property is provided, reducing the main function to under 20 lines while maintaining the same functionality.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/workflows/backfillTrackStep.ts`:
- Around line 29-40: The current logic treats all non-404 statuses as retryable
failures and marks them as "failed", but permanent failures like 4xx errors
should not be recycled through the reclaim sweep. Replace the binary noData
logic with explicit checks to distinguish retryable statuses (429, 5xx, timeout)
from permanent failures. Mark only retryable statuses as "failed" for the
updateSongstatsBackfillQueue call, and mark permanent 4xx failures and other
terminal errors as "done" to prevent endless recycling through reclaim.
In `@lib/research/measurement_jobs/enqueueHistoricalBackfill.ts`:
- Around line 48-54: The loop iterates through ISRCs and awaits each
upsertSongstatsBackfillQueue call sequentially, creating N round trips that can
timeout for large catalogs. Refactor this to use bounded-concurrency pattern:
collect all the upsertSongstatsBackfillQueue promises (skipping already
backfilled ISRCs) and execute them concurrently using Promise.all or a
concurrency limiter like pLimit to batch requests. This reduces round trips and
prevents API duration limit timeouts while still maintaining safe concurrency
bounds.
In `@lib/research/measurement_jobs/getMeasurementJobHandler.ts`:
- Around line 17-25: The id route parameter in the getMeasurementJobHandler
function is passed directly to getMeasurementJob without validation. Create a
Zod validator (either a dedicated function like validateGetMeasurementJobQuery
or an inline schema) to validate the id parameter early in the function, before
the getMeasurementJob call. This validator should return a 400 NextResponse if
the id fails validation, ensuring invalid IDs are caught consistently before any
data access occurs.
- Around line 22-26: The getMeasurementJob function currently queries using only
the id parameter, which creates a security vulnerability allowing cross-account
job polling. Extract the accountId from the authResult object returned by
validateAuthContext, pass it as a parameter to the getMeasurementJob function
call, and then update the underlying getMeasurementJob implementation and its
data-access layer to enforce that the query filters by both id and account_id
equals the provided accountId to ensure account-scoped access.
In `@lib/supabase/songstats_backfill_queue/reclaimStaleBackfillRows.ts`:
- Around line 24-27: The error handling in reclaimStaleBackfillRows masks query
failures by returning 0 instead of propagating the error to the caller. When an
error occurs (in the if block checking the error variable), throw or rethrow the
error instead of returning 0, so callers can detect the failure and fail fast
rather than treating it as a successful zero reclaim operation.
---
Nitpick comments:
In `@app/workflows/backfillTrackStep.ts`:
- Around line 19-68: The backfillTrackStep function exceeds the recommended
function length and combines multiple distinct responsibilities: HTTP fetching,
status classification and error handling, response parsing, data transformation,
upsert operations, quota ledgering, and queue-state management. Extract each
responsibility into its own focused function - for example, a separate function
to handle non-200 response cases (that classifies the error and records the
ledger entry), another function to parse and transform the historic stats
payload into measurement rows, and potentially another to perform the upsert and
queue update. This will reduce backfillTrackStep to a simple orchestrator that
calls these smaller focused units in sequence, making each piece independently
testable and easier to reason about according to the coding guidelines for
keeping functions small and focused.
In `@lib/research/measurement_jobs/createMeasurementJob.ts`:
- Around line 21-47: The createMeasurementJob function exceeds 20 lines and
mixes branching logic with response-shaping concerns. Extract the response
mapping logic that transforms the SnapshotData (stored in variable d) into the
CreateMeasurementJobResult format into a separate helper function. This helper
should accept the SnapshotData and return the mapped data object with status,
source, id, state, album_count, and estimated_cost_usd fields. Then call this
helper function from createMeasurementJob to keep the main function focused and
under 20 lines.
In `@lib/research/measurement_jobs/enqueueHistoricalBackfill.ts`:
- Around line 20-58: The function enqueueHistoricalBackfill currently handles
multiple distinct concerns (resolving scope, validating results, classifying
measurements, mutating queue, and formatting response) in a single function that
exceeds the 20-line guideline. Extract each logical stage into separate focused
helper functions such as one for resolving and validating scope songs, one for
classifying measurements into backfilled and pending categories, and one for
enqueueing items into the queue, then have enqueueHistoricalBackfill orchestrate
these helpers and return the final result. This will make each function small,
testable, and easier to maintain.
In `@lib/research/measurement_jobs/resolveScopeSongs.ts`:
- Around line 15-35: The resolveScopeSongs function exceeds the 20-line limit by
combining three distinct resolution paths. Extract each branch into separate
internal helper functions: one for the isrcs resolution (the scope.isrcs
condition), one for the album_ids resolution (the scope.album_ids condition),
and one for the catalog_id resolution (the scope.catalog_id condition). Keep the
main resolveScopeSongs function as a simple orchestrator that conditionally
calls the appropriate helper based on which scope property is provided, reducing
the main function to under 20 lines while maintaining the same functionality.
In `@lib/research/playcounts/playcountMaintenanceHandler.ts`:
- Around line 17-44: The playcountMaintenanceHandler function exceeds the
20-line guideline and combines multiple responsibilities: authorization
validation, orchestration of three async operations, and response mapping.
Extract the orchestration logic (the three await calls for
reclaimStaleBackfillRows, start, and startDueMonthlySnapshots) and response
mapping into a separate helper function. Keep playcountMaintenanceHandler
focused on handling the request/response flow and delegating the core business
logic to this new helper. This will reduce the main handler to well under 20
lines while maintaining clarity of intent.
In `@lib/supabase/songstats_backfill_queue/reclaimStaleBackfillRows.ts`:
- Line 1: The import statement at the top of reclaimStaleBackfillRows.ts uses a
relative path to import serverClient instead of the canonical repository alias.
Change the import statement from the relative path "../serverClient" to use the
alias "`@/lib/supabase/serverClient`" to maintain consistency with Supabase module
import conventions across the codebase.
- Line 15: The function reclaimStaleBackfillRows does not follow the required
Supabase operation naming convention. Rename this function to use one of the
required verb-style prefixes: select*, insert*, update*, delete*, or get*.
Choose the appropriate prefix based on the operation's actual behavior (for
example, if it updates rows to mark them as reclaimed, use update*; if it
retrieves rows for processing, use select* or get*). After renaming the exported
function definition, update all imports and call sites throughout the codebase
to use the new name.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b3e5b38a-f404-4d03-a914-8ee8a9ef5841
⛔ Files ignored due to path filters (9)
app/workflows/__tests__/backfillTrackStep.test.tsis excluded by!**/*.test.*,!**/__tests__/**and included byapp/**lib/research/measurement_jobs/__tests__/createMeasurementJob.test.tsis excluded by!**/*.test.*,!**/__tests__/**and included bylib/**lib/research/measurement_jobs/__tests__/createMeasurementJobHandler.test.tsis excluded by!**/*.test.*,!**/__tests__/**and included bylib/**lib/research/measurement_jobs/__tests__/enqueueHistoricalBackfill.test.tsis excluded by!**/*.test.*,!**/__tests__/**and included bylib/**lib/research/measurement_jobs/__tests__/getMeasurementJob.test.tsis excluded by!**/*.test.*,!**/__tests__/**and included bylib/**lib/research/measurement_jobs/__tests__/resolveScopeSongs.test.tsis excluded by!**/*.test.*,!**/__tests__/**and included bylib/**lib/research/measurement_jobs/__tests__/validateCreateMeasurementJobRequest.test.tsis excluded by!**/*.test.*,!**/__tests__/**and included bylib/**lib/research/playcounts/__tests__/playcountMaintenanceHandler.test.tsis excluded by!**/*.test.*,!**/__tests__/**and included bylib/**lib/supabase/songstats_backfill_queue/__tests__/reclaimStaleBackfillRows.test.tsis excluded by!**/*.test.*,!**/__tests__/**and included bylib/**
📒 Files selected for processing (12)
app/api/research/measurement-jobs/[id]/route.tsapp/api/research/measurement-jobs/route.tsapp/workflows/backfillTrackStep.tslib/research/measurement_jobs/createMeasurementJob.tslib/research/measurement_jobs/createMeasurementJobHandler.tslib/research/measurement_jobs/enqueueHistoricalBackfill.tslib/research/measurement_jobs/getMeasurementJob.tslib/research/measurement_jobs/getMeasurementJobHandler.tslib/research/measurement_jobs/resolveScopeSongs.tslib/research/measurement_jobs/validateCreateMeasurementJobRequest.tslib/research/playcounts/playcountMaintenanceHandler.tslib/supabase/songstats_backfill_queue/reclaimStaleBackfillRows.ts
| const authResult = await validateAuthContext(request); | ||
| if (authResult instanceof NextResponse) return authResult; | ||
|
|
||
| const result = await getMeasurementJob({ id }); | ||
| if ("error" in result) return errorResponse(result.error, result.status); |
There was a problem hiding this comment.
Enforce account-scoped reads when polling a measurement job.
Auth is validated but the lookup is performed with id only. This can allow cross-account job polling if an ID is known. Pass accountId through to getMeasurementJob and enforce account scoping in the underlying query.
🔒 Direction for fix
-const result = await getMeasurementJob({ id });
+const result = await getMeasurementJob({ id, accountId: authResult.accountId });Then enforce account_id = accountId in getMeasurementJob/data-access.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@lib/research/measurement_jobs/getMeasurementJobHandler.ts` around lines 22 -
26, The getMeasurementJob function currently queries using only the id
parameter, which creates a security vulnerability allowing cross-account job
polling. Extract the accountId from the authResult object returned by
validateAuthContext, pass it as a parameter to the getMeasurementJob function
call, and then update the underlying getMeasurementJob implementation and its
data-access layer to enforce that the query filters by both id and account_id
equals the provided accountId to ensure account-scoped access.
There was a problem hiding this comment.
7 issues found across 21 files
Confidence score: 2/5
lib/research/measurement_jobs/getMeasurementJob.tsdoes not scope job reads to the authenticated account, so a user who knows an id could read another account’s job status; this is the highest-risk issue because it crosses tenant boundaries — enforce account-scoped lookup (and authorization) before merging.lib/research/measurement_jobs/getMeasurementJob.ts,lib/research/measurement_jobs/enqueueHistoricalBackfill.ts, andlib/supabase/songstats_backfill_queue/reclaimStaleBackfillRows.tsblur real DB failures into “not found” or success-like responses, which can hide dropped work and operational outages — return/propagate explicit error states so clients and handlers can react correctly.app/workflows/backfillTrackStep.tscurrently marks most non-404 responses as reclaimable failures, so permanent client errors (400/401/403) can loop forever through reclaim and waste queue capacity — restrict retries to transient classes (timeouts, 429, 5xx) before merge.lib/supabase/songstats_backfill_queue/__tests__/reclaimStaleBackfillRows.test.tsdoes not strongly verify the criticaland()grouping, so regressions could reclaimdone/pendingrows unnoticed, andlib/research/measurement_jobs/enqueueHistoricalBackfill.ts’s sequential upserts increase timeout risk on large jobs — tighten the query-structure assertion and consider concurrent upserts to de-risk rollout.
Architecture diagram
sequenceDiagram
participant Client as External Client
participant API as API Route (Next.js)
participant Validate as validateCreateMeasurementJobRequest
participant Auth as Auth Service
participant Snapshot as createSnapshot (lib)
participant Historical as enqueueHistoricalBackfill (lib)
participant DB as Database (Supabase)
participant Workflow as Songstats Backfill Workflow (Temporal)
participant Songstats as Songstats API
Note over Client,Songstats: NEW: Measurement-Jobs Ingest Resource (write side)
Client->>API: POST /api/research/measurement-jobs {scope, source}
API->>Validate: validate (auth + schema)
Validate->>Auth: validateAuthContext(request)
alt Auth fails (401)
Auth-->>Validate: NextResponse 401
Validate-->>API: Short-circuit 401
API-->>Client: 401 Unauthorized
else Auth succeeds
Auth-->>Validate: {accountId}
Validate->>Validate: Parse body with Zod (scope, source, platforms)
alt Validation fails (bad scope, unknown source)
Validate-->>API: NextResponse 400
API-->>Client: 400 Bad Request
else Valid
Validate-->>API: {accountId, body}
API->>API: Dispatch based on source
alt source="current"
API->>Snapshot: createSnapshot({accountId, scope, platforms, schedule:"once"})
Snapshot->>DB: Insert/queue snapshot job
DB-->>Snapshot: snapshot_id
Snapshot-->>API: {data: {snapshot_id, state, ...}}
API->>API: Map snapshot_id → id
API-->>Client: 202 {status:"success", source:"current", id, state, ...}
else source="historical"
API->>Historical: enqueueHistoricalBackfill(scope)
Historical->>Historical: resolveScopeSongs(scope)
Historical->>DB: Query song identifiers / catalog songs
DB-->>Historical: ISRC list
alt No ISRCs resolved
Historical-->>API: {error:"No recordings resolvable...", status:400}
API-->>Client: 400 Bad Request
else ISRCs exist
Historical->>DB: selectSongMeasurements(songs, spotify, play_count)
DB-->>Historical: Newest-first measurement rows
Historical->>Historical: Build latestValue map & alreadyBackfilled set
loop Each ISRC not already backfilled
Historical->>DB: upsertSongstatsBackfillQueue({song:isrc, rank_score})
end
Historical-->>API: {data: {status:"success", source:"historical", enqueued, skipped}}
API-->>Client: 202 {enqueued, skipped, ...}
end
end
end
end
Note over Client,Songstats: GET endpoint to poll current job status
Client->>API: GET /api/research/measurement-jobs/{id}
API->>Auth: validateAuthContext(request)
alt Auth fails
Auth-->>API: 401
API-->>Client: 401
else Auth succeeds
API->>API: getMeasurementJob({id})
API->>DB: selectPlaycountSnapshots({id})
alt Job found
DB-->>API: Row (state, album_count, ...)
API-->>Client: 200 {status:"success", id, source:"current", state, ...}
else Not found
DB-->>API: Empty rows
API-->>Client: 404 Unknown measurement-job id
end
end
Note over Client,Songstats: CHANGED: Retryable backfill drain (cron)
participant Cron as playcount-maintenance Cron (Next.js)
Cron->>Cron: validateCronRequest()
Cron->>Cron: reclaimStaleBackfillRows()
Cron->>DB: UPDATE songstats_backfill_queue SET status='pending' WHERE status='failed' OR (status='in_progress' AND updated_at < 1h ago)
DB-->>Cron: Count of rows reclaimed
Cron->>Workflow: start(songstatsBackfillWorkflow)
Workflow->>Songstats: backfillTrackStep (per row)
alt Songstats returns 200 (success)
Songstats-->>Workflow: measurements
Workflow->>DB: upsertSongMeasurements + mark row 'done'
else Songstats returns 404 (no history)
Songstats-->>Workflow: 404
Workflow->>DB: Mark row 'done' (terminal, never retried)
Workflow->>DB: Insert quota ledger (no data 404)
else Songstats returns 429/5xx/timeout (transient)
Songstats-->>Workflow: error
Workflow->>DB: Mark row 'failed' (reclaimable on next cron)
Workflow->>DB: Insert quota ledger (failed X)
end
Workflow-->>Cron: Run ID
Cron->>DB: startDueMonthlySnapshots()
DB-->>Cron: count started
Cron-->>Client: 202 {status:"success", reclaimed, backfill_run_id, monthly_snapshots_started}
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
…s#242 review)
Per review on docs#242 (DRY): GET /research/measurement-jobs/{id} duplicated the
generic GET /api/tasks/runs, and the old snapshot flow had no status endpoint.
Remove the GET status route/handler/data fn + test. POST still returns the job id.
|
Updated per docs#242 review: dropped |
There was a problem hiding this comment.
0 issues found across 1 file (changes from recent commits).
Requires human review: Auto-approval blocked by 5 unresolved issues from previous reviews.
Re-trigger cubic
There was a problem hiding this comment.
Songstats has a heavy enforced cap. We want to really focus on enforcing 2 things
- DRY / idempotent - Do not duplicate a query to Songstats if we already have the same data in supabase.
- Card on file - The authenticated account must have a payment method in stripe in order to use the Songstats endpoints. If they don't, we should return a stripe checkout link for a checkout page for the free tier.
There was a problem hiding this comment.
Done — both enforced (a283fb16):
1. DRY / idempotent — Songstats is never re-queried for data we already have, at 3 layers:
- enqueue skips songs that already carry
songstatsmeasurements (alreadyBackfilledset); - the queue upsert is
onConflict: song, ignoreDuplicates(one row per song, ever — verified live: 2 identical jobs → 1 row); - the drain marks
done(incl. 404 no-data) and never re-fetches.
2. Card on file — new ensureSongstatsPaymentMethod: historical jobs (the only Songstats-spending path) now require a default payment method. If none, it returns 402 + a Stripe Checkout link for the free tier (the subscription+30-day-trial createStripeSession). current (Apify) is exempt. Wired into the validator before any enqueue.
| * | ||
| * @returns Number of rows returned to `pending`. | ||
| */ | ||
| export async function reclaimStaleBackfillRows(): Promise<number> { |
There was a problem hiding this comment.
KISS
- actual: reclaimStaleBackfillRows
- required: updateSongstatsBackfillQueue.ts
There was a problem hiding this comment.
Done (a283fb16) — moved the reclaim into updateSongstatsBackfillQueue.ts as reclaimStaleSongstatsBackfillRows and deleted the standalone reclaimStaleBackfillRows.ts. It now also throws on DB error (consistent with updateSongstatsBackfillQueue) instead of masking it as 0, and the test asserts the exact and() grouping. Heads-up: this puts two exports in one supabase file (vs the usual one-per-file) — kept it as you asked since both are queue-status mutations; say the word if you'd rather split it back out.
Preview verification ✅ — Done-when criteria metSynced with
|
| Check | Result |
|---|---|
POST no auth |
401 "Exactly one of x-api-key or Authorization…" |
| historical — both tracks already backfilled (The Spins, Nikes) | 202 {enqueued: 0, skipped: 2} — no track fetched twice ✅ |
historical — un-backfilled track (USA2P1122877, 357M streams) |
202 {enqueued: 1, skipped: 0} → queue row created pending, rank_score = 357,340,447 (ranked by all-time count) ✅ |
| idempotency — same historical job ×2 | exactly 1 queue row (upsert onConflict: song dedup) ✅ |
| current — single album | 202 {source: "current", id: "98a6…46ec", state: "queued", album_count: 1, estimated_cost_usd: 0.003} — snapshot_id→id mapped; behaves like snapshots ✅ |
validation — missing source |
400 "source must be current or historical" |
validation — bad source |
400 same |
validation — zero / two scope keys |
400 "Provide exactly one of scope.catalog_id / album_ids / isrcs" |
Verified the queue side effects directly in songstats_backfill_queue (rank, single-row dedup), then removed the test enqueue row — queue restored to baseline.
Retryable backfill drain + reclaim
| Check | Result |
|---|---|
2 stranded failed rows present + reclaimable |
✅ (the sweep's predicate matches both; verified read-only) |
/api/internal/playcount-maintenance route deployed + gated |
401 without CRON_SECRET ✅ |
retry classification (404→done terminal, 429/5xx→failed reclaimable) + reclaim sweep |
covered by 8 unit tests (RED→GREEN) |
Honest limitation: the live reclaim+drain runs via the daily cron (CRON_SECRET-gated) and isn't externally triggerable, so I couldn't exercise "the 2 stranded rows drain on the next run" end-to-end here — it's verified by the unit tests + the deployed/gated cron route + the reclaimable queue state. Happy to trigger it live if you share CRON_SECRET, or it self-verifies on the next daily run.
Notes
- The
currenttest queued one real snapshot (98a6925d, ~$0.003, 1 album) — a benign re-capture (same as the daily snapshot). - Auth via a preview-environment agent key (not echoed).
Verified: @sweetmantech session.
…nal/retryable backfill, KISS reclaim Review feedback on chat#1796 (sweetmantech + bots): - **Card on file (Songstats gate):** `historical` measurement-jobs now require a payment method (they spend the capped Songstats quota). New ensureSongstatsPaymentMethod: 402 + free-tier Stripe Checkout link when no card. `current` (Apify) is exempt. - **backfillTrackStep:** only 408/429/5xx are retryable (`failed`); 404 + other permanent 4xx are terminal (`done`) — prevents reclaim from recycling perma-fails. - **KISS:** moved the reclaim into updateSongstatsBackfillQueue.ts as reclaimStaleSongstatsBackfillRows; it now throws on DB error instead of masking it as 0; stronger test asserts the and() grouping. - **enqueueHistoricalBackfill:** bounded-concurrency batches (25) instead of N serial upserts. DRY/idempotency unchanged (already enforced: skip songstats-having songs + upsert dedup + drain skips done). 39 unit tests; research+workflows suite 313 green; tsc/lint/format clean.
Review feedback addressed —
|
… vars (KISS, api#668 review)
Card-on-file gate — re-verified on fresh preview (head
|
Implements the api consolidation for chat#1796 (write side), against the docs contract in recoupable/docs#242. TDD throughout — tests written first, RED confirmed, then implemented.
measurement-jobs— the ingest resourceReplaces
POST /research/snapshotsand the never-builtbackfillverb, and is what skills#43'scatalog-value-estimatorseed calls.POST /api/research/measurement-jobs{ scope, source }source:"current"→ reuses the existing snapshot pipeline (createSnapshot), mapssnapshot_id→id.source:"historical"→ resolves the scope to ISRCs, enqueues each for Songstats backfill ranked by latest count, and skips songs already carryingsongstatshistory (idempotent — no track fetched twice). Free. Returns{ enqueued, skipped }.— removed per docs#242 review (DRY): run status is covered by the genericGET /api/research/measurement-jobs/{id}GET /api/tasks/runs; the old snapshot flow had no status endpoint either. POST still returns the jobid.Layered per house style:
route → handler → validate → data, one exported fn per file, reusingcreateSnapshot/selectSongMeasurements/upsertSongstatsBackfillQueue/resolveScopehelpers.Retryable backfill drain (robustness gap from #1796)
backfillTrackStep: a definitive404(no history) →done(terminal, never retried); transient429/5xx/timeout →failed(reclaimable) instead of permanently stranding the track.reclaimStaleBackfillRows: resetsfailed+ orphanedin_progressrows topending.playcount-maintenancecron reclaims before each drain — so the 2 currently-strandedfailedrows auto-recover on the next run.Testing
lib/research+app/workflowssuite: 291 passed, no regressions.tsc --noEmit: clean on all touched files (pre-existing repo errors unrelated). Lint clean.Scope / follow-up
This PR is the write consolidation + the drain fix. The read collection (
GET /research/tracks|albums/{id}/measurements— repackaging the still-serving, now-deprecatedhistoric-stats/playcount-deltas/playcountsreads) is the remaining api slice from #1796 and lands in a fast-follow. No endpoint is removed here — legacy paths keep serving.🤖 Generated with Claude Code
Summary by cubic
Consolidates research ingest behind a new
measurement-jobswrite API for chat#1796, adds a Songstats card-on-file gate for historical jobs, and makes the backfill drain reliably retryable. Drops the bespoke status route; useGET /api/tasks/runsfor status.New Features
/api/research/measurement-jobs{ scope, source, platforms? }(defaultsplatforms: ["spotify"])source: "current"→ reuses the snapshot pipeline and returns{ id, state, album_count, estimated_cost_usd }(idmapped fromsnapshot_id).source: "historical"→ requires a card on file (402 + Stripe checkout URL when missing), resolvesscopeto ISRCs, enqueues Songstats backfill ranked by latest count in bounded batches (25), and skips songs that already havesongstatshistory. Returns{ enqueued, skipped }. Free.Bug Fixes
backfillTrackStep: 404 and other permanent 4xx → markdone(terminal); 408/429/5xx → markfailed(reclaimable).playcount-maintenance: callsreclaimStaleSongstatsBackfillRowsbefore each drain to resetfailedand stalein_progressrows (throws on DB error).Written for commit a283fb1. Summary will update on new commits.
Summary by CodeRabbit
New Features
Bug Fixes
Chores