Skip to content

feat: route new task runs to a parallel task_run_v2 table#4000

Draft
d-cs wants to merge 76 commits into
mainfrom
runstore-table-redirect
Draft

feat: route new task runs to a parallel task_run_v2 table#4000
d-cs wants to merge 76 commits into
mainfrom
runstore-table-redirect

Conversation

@d-cs

@d-cs d-cs commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

Summary

New task runs can be routed to a parallel task_run_v2 Postgres table instead of the main TaskRun table, decided per-org by a feature flag and keyed purely by the run id's format. Existing runs stay in TaskRun, with no backfill. The flag ships off, so behavior is unchanged until an org is opted in.

This builds on the RunStore adapter that already funnels all Postgres TaskRun access through one place (writes in #3981, reads in #3990). RunStore now routes each run to its physical table by id format: a KSUID id means task_run_v2, anything else (including legacy cuids) means TaskRun.

Design

  • The discriminator is the id format. New runs mint a KSUID when their org has the runTableV2 flag on; everyone else keeps minting legacy ids. The flag is read in memory at the single mint site in the trigger path, so the hot path adds no query. RunStore never sees the flag: it routes purely by isKsuidId(id), and a malformed id falls back to legacy.
  • By-id reads and writes stay single-table (O(1), one table). Only predicate reads that cannot name a table touch both. findRuns does a bounded two-way merged keyset cursor (ordered reads standardize on a (createdAt, id) keyset, since cuid and KSUID do not share a sort range), and a non-id findRun (idempotency-key dedup, or "are there any runs in this environment") queries both tables. Both apply identical scoping to each table, so a merge cannot leak a run across an auth boundary.
  • Idempotency is three-source while an org has runs in both tables: legacy TaskRun, task_run_v2, and the mollifier buffer, so a reused key is always found and never produces a duplicate run.
  • The ClickHouse mirror is always ready. The replication service co-publishes task_run_v2 from the start (empty until orgs cut over), streaming its WAL rows through the same transform into the same ClickHouse table.

task_run_v2 carries the same relations as TaskRun, and the incoming foreign keys pointing at TaskRun are dropped so the two tables are not coupled by constraints.

Stacked on #3990 (its base), so this PR shows only the routing commits on top of the read adapter.

Before enabling the flag for any org, task_run_v2 needs REPLICA IDENTITY FULL applied the same out-of-band way as TaskRun, so its update and delete events stream to ClickHouse with the old row.

d-cs added 30 commits June 17, 2026 13:35
Replaces the seven throwing stubs on PostgresRunStore with verbatim
relocations of the Prisma statements from runAttemptSystem: startAttempt,
completeAttemptSuccess, recordRetryOutcome, requeueRun,
recordBulkActionMembership, cancelRun, and failRunPermanently. Each method
splices the caller-supplied select/include into the Prisma call. Tests
use real Postgres containers and cover each method including edge cases
(append semantics, conditional fields in cancelRun).
…y-clear, and array-append methods

Replaces the seven throwing stubs in PostgresRunStore with verbatim-relocated
Prisma statements sourced from delayedRunSystem, debounceSystem, updateMetadata,
idempotencyKeys, resetIdempotencyKey, batchTriggerV3, and the realtime-stream
route handlers.

- rescheduleRun: writes delayUntil always; queueTimestamp when provided; nested
  DELAYED executionSnapshot when snapshot arg provided
- enqueueDelayedRun: sets status PENDING + queuedAt
- rewriteDebouncedRun: pass-through update with associatedWaitpoint include
- updateMetadata: optimistic-lock path (updateMany with version predicate) or
  direct path (update without predicate); both return { count }
- clearIdempotencyKey: three discriminated-union branches — byId clears both
  columns, byPredicate clears both, byFriendlyIds clears only idempotencyKey
- pushTags: push-append to runTags array; returns { updatedAt }
- pushRealtimeStream: push-append to realtimeStreams array; returns void
…bapp BaseService

Add RunStore field to SystemResources, instantiate PostgresRunStore in
RunEngine constructor (after prisma/readOnlyPrisma are set), and expose
it on the resources object passed to all systems. Create a webapp
singleton (runStore.server.ts) and thread it as a default parameter
into BaseService so subclasses can access it without changes.
The service statically imported the db.server-backed runStore singleton,
which dragged the Prisma client into otherwise-light test module graphs and
opened an eager connection to DATABASE_URL on import. The metadata service
test then threw an unhandled connection error whenever no database was
reachable at the configured address.

Make runStore a required constructor option, pass the singleton at the
production construction site, and inject a testcontainer-backed store in the
tests.
Add findRun, findRunOrThrow and findRuns to RunStore, mirroring the
existing write methods. They pass where/select/include through the same
Prisma generics and default to the read replica, while letting the caller
pass the writer or a transaction client when needed. This lets Postgres
reads of TaskRun be routed through the store the same way writes already
are. Additive only; no call sites change yet.
Add a no-args overload to findRun, findRunOrThrow and findRuns that
returns the whole TaskRun row, for callers that read a run without a
select or include.
Relocate the direct TaskRun reads in the engine and its systems to the
RunStore read methods, preserving the exact client (writer, replica, or
transaction) at each site. Behavior-preserving; the engine test suite is
unchanged.
…tore

Relocate the direct TaskRun reads in webapp services, run-engine concerns,
realtime, mollifier and metadata to the RunStore read methods, preserving
the exact client (writer, replica, or transaction) at each site. The run
hydrator now receives the store by injection. Behavior-preserving.
Relocate the dashboard presenter TaskRun reads to the RunStore read
methods, preserving the exact client per site. Behavior-preserving.
…store

Relocate the route and loader TaskRun reads to the RunStore read methods,
preserving the exact client per site, including the replica-resolve then
writer-recheck realtime paths. Behavior-preserving.
…store

Decompose the three reads that pulled TaskRun in through a parent model's
relation include (alert, batch results, attempt dependencies): query the
parent without the include, hydrate the run(s) via RunStore in a single
batched read, and stitch them back. Preserves field selection, ordering,
null handling and the query client. Adds container-backed tests for the
batch-results and cancel-dependencies paths.
…tover

The recovery script joins TaskRunExecutionSnapshot to TaskRun in raw SQL, so
it is the one TaskRun read not routed through the run store. Add a note to
revisit it at table cutover.
d-cs added 7 commits June 22, 2026 10:22
…on id-list reads

findRuns now throws when given skip: offset pagination cannot span the two run tables, where each would independently skip N rows from its own result rather than N from the merged result. For an id-list predicate (id in [...]), it now queries only the table whose id format can contain those ids, avoiding a wasted query against an empty task_run_v2 while it is unpopulated during rollout.
…e merge collation

A single-format id-list narrows findRuns to one physical table, but the ordered+limited path still built the cross-table comparator and threw the time-key guard; it now delegates natively to the one table (Postgres orders within a single table fine). Separately, the in-memory merge comparator ordered strings by code unit while the Postgres keyset continuation orders by the database collation (en_US); switching the comparator to localeCompare makes them agree, so a tied-createdAt boundary spanning both tables no longer skips or duplicates a row.
The pre-gate idempotency claim was eligible only when the org was on the mollifier. Concurrent same-key triggers that straddle a runTableV2 flip can mint into different physical tables, whose per-table unique constraints can't see each other, so two runs could share one key. The claim is now also eligible when the org is cut over to the v2 run table, serialising those triggers through Redis.
…warn when missing

A v2 run DELETE needs the full old row so its ClickHouse soft-delete tombstone carries organization and environment ids; under the default replica identity those are dropped and the tombstone is lost. A migration sets REPLICA IDENTITY FULL on task_run_v2 rather than relying on an out-of-band step, and the replication client now warns when any co-published table that publishes UPDATE/DELETE lacks FULL. Adds a replication test for the v2 DELETE tombstone.
A v2 run can reference a legacy parent/root, or have legacy children, when a hierarchy straddles a runTableV2 flip. Prisma relation selects are bound to one table, so the run, span, and API-retrieve presenters returned null parent/root and dropped cross-table children. They now resolve parent/root by id (RunStore routes by id format) and children by a both-table predicate, via a shared hydrateParentAndRoot/hydrateChildRuns helper.
When a non-id predicate matches a row in both physical tables, findFirstAcrossTables now returns the v2 copy instead of legacy. Under this PR a run is in exactly one table (createRun routes by id format), so this is a no-op today; it forward-aligns with the later slow legacy to v2 migration, which copies a run into task_run_v2 (the canonical, operated-on copy) before operating. A comment in findRuns marks the matching dedup-by-id work for that migration PR.
coderabbitai[bot]

This comment was marked as resolved.

d-cs added 5 commits June 22, 2026 14:20
TaskRunV2 declared implicit many-to-many relations (tags, connectedWaitpoints) whose join tables were never created by any migration and are absent from the database. Nothing reads them (v2 run tags use the scalar runTags array), so they were pure schema-vs-migration drift. Removing them makes the schema match the database with no migration.
findRuns rejects a Prisma cursor or a negative take on a both-tables read (neither can span two tables) instead of silently returning a wrong or empty result, and tablesForWhere now routes a plain id or friendlyId equality to the single matching table by id format, not just id:{in} lists. Also documents that the cross-table merge comparator assumes the en_US database collation and the COLLATE C fix needed for other collations.
… off

Concurrent same-key triggers that straddle a runTableV2 flag flip can mint into different physical tables (cuid to TaskRun, ksuid to task_run_v2), whose per-table unique constraints cannot see each other, so neither insert conflicts and two runs share one key. The pre-gate claim now resolves its backend through a claim-only Redis buffer when the mollifier buffer is absent, so it serialises these triggers instead of falling open. v2-cutover orgs are claim-eligible for every idempotency-keyed trigger, including triggerAndWait, debounce, and one-time-use tokens, and the claim-resolved path blocks the parent on the winner's waitpoint.
A run routed to task_run_v2 was invisible to the Electric realtime feed, whose shapes were bound to the TaskRun table, so subscribeToRun, useRealtimeRun, and run polling returned nothing for those runs. Single-run subscriptions now route the shape to the correct table by id format, and the tag and batch feeds run two upstream shapes (TaskRun and task_run_v2) merged under one composite cursor the client round-trips opaquely, so no SDK change is needed.
runTableV2 is resolved per organization only, so a global toggle on the admin flags page did nothing. Mark it read-only there to remove the misleading control; per-org control stays on the org dialog.
coderabbitai[bot]

This comment was marked as resolved.

d-cs added 2 commits June 22, 2026 14:38
…ment

The parent/root/child hydration that resolves a run's hierarchy across both run tables looked runs up by id alone. Those pointers are now plain scalars with no foreign-key enforcement, so a stale or malformed pointer could resolve to a run in another environment and leak its metadata through the run and span presenters. Scope every lookup to the run's runtimeEnvironmentId, restoring the same-environment guarantee the table-bound relation select used to provide.
When the two-table realtime shape merge returns as soon as one upstream shape yields, it aborts the other fetch and returns immediately. That promise was left without a rejection handler, so the abort could surface as an unhandled rejection on the server. Attach a no-op catch to the aborted fetch.
devin-ai-integration[bot]

This comment was marked as resolved.

d-cs added 9 commits June 22, 2026 15:08
The two-table shape merge could leave one upstream fetch pending without a rejection handler when it aborts the race loser or rethrows from the catch block. Attach a detached no-op catch to both fetches up front so an abandoned fetch can never surface as an unhandled rejection on any path. Also document that a tag/batch subscription opens two upstream Electric connections while an org spans both run tables.
…ectric shapes

Electric realtime shapes are bound to a single table, so a task_run_v2 run was invisible to realtime subscriptions. The previous approach merged two Electric shapes per tag/batch feed under a composite cursor, which doubled Electric long-poll connections for those feeds. Electric is being retired in favor of the native realtime backend, which is table-agnostic and already observes both run tables, so that merge is throwaway.

Drop the Electric dual-shape merge (revert realtimeClient to its single-table form, remove the merge module) and instead gate runTableV2 on the native backend: a run only routes to task_run_v2 when the deployment has native realtime enabled and the org's realtimeBackend flag is native. This keeps v2 runs realtime-observable without touching Electric, and the gate auto-satisfies once Electric is removed and native is the default. The idempotency pre-gate claim inherits the same gate.
Completes the Electric-merge removal: a run only routes to task_run_v2 when the deployment has native realtime enabled and the org's realtimeBackend flag is native. Electric shapes are single-table and can't observe a v2 run, so without this gate a v2 run would be realtime-invisible. shouldUseV2RunTable takes the native-realtime master switch as a parameter (kept env-free for unit tests); the trigger mint site and the idempotency pre-gate claim both pass it.
Restore the both-table Electric shape merge so tag-list and batch realtime
feeds observe runs in TaskRun and task_run_v2 together, and gate the v2 run
table on the runTableV2 flag alone (drop the native-realtime coupling). New
runs route to task_run_v2 whenever an org has the flag on and stay visible in
realtime on the existing Electric backend.

Single-run feeds route to one table by id format; only tag and batch feeds fan
out to both shapes under one composite continuation.
…ed window

Routes that walk the run hierarchy through a Prisma relation only see one
physical table, so during a runTableV2 flag flip (a parent and child on
opposite tables) they silently miss the cross-table run. This closes the
reachable cases:

- cancelRun resolves child runs across both tables, so cancelling a parent
  cascades to a child in the other table instead of leaving it executing
  and holding concurrency.
- updateMetadata routes metadata.parent/root operations to the scalar
  parent/root id, so they reach a parent in the other table instead of
  falling back to the child run.
- a one-time-use token with no idempotency key now takes a cross-table
  claim for v2 orgs, so two presentations straddling a flip cannot each
  mint a run in a different table.
- the Electric shape merge reports up-to-date only when both tables are
  caught up, so a multi-chunk initial snapshot no longer drops the rows
  that arrive after the first chunk.
… mixed window

A cuid parent (TaskRun) with a ksuid child (task_run_v2): cancelling the
parent must cascade to the child in the other table. Fails against the old
table-bound childRuns relation, passes with the cross-table findRuns lookup.
…tables

An unordered take capped each run table independently and concatenated the
two results, so a both-table read could silently drop one table rows once
the other filled the cap. Reject it like the existing skip and cursor guards;
callers that need a bounded cross-table read pass an orderBy for the keyset
merge.
The guard added in the previous commit makes that call throw rather than
return a non-deterministic cap; this test asserted the removed cap behavior.
The throw is covered by the guard test alongside the skip/cursor guards.

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

Open in Devin Review

Comment on lines +125 to +137
export function decodeBackfillCursor(cursor: string): { createdAt: Date; id: string } {
const separatorIndex = cursor.indexOf(BACKFILL_CURSOR_SEPARATOR);
const createdAt = separatorIndex === -1 ? new Date(NaN) : new Date(cursor.slice(0, separatorIndex));
const id = separatorIndex === -1 ? "" : cursor.slice(separatorIndex + 1);

if (Number.isNaN(createdAt.getTime()) || id.length === 0) {
throw new Error(
`RunsBackfillerService: malformed cursor "${cursor}" (expected "<createdAt>_<id>")`
);
}

return { createdAt, id };
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Backfill cursor format change is backward-incompatible with in-flight batches

The backfill cursor changed from a plain run id (lastRun.id) to a composite <createdAt>_<id> string (runsBackfiller.server.ts:110). decodeBackfillCursor at line 125 throws on a malformed cursor. If a backfill job is in progress when this code deploys, the admin worker will pass the old-format cursor (a bare id) to the new decodeBackfillCursor, which will throw because the separator _ isn't found in a cuid (cuids are [a-z0-9]{25} with no underscore). The error message is clear and the backfill can be restarted from scratch, but an in-flight backfill will fail on the first batch after deploy.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

d-cs added 3 commits June 22, 2026 18:54
…-token claim)

- The strengthened findRuns guard threw on GET /api/v1/runs/:runId/spans/:spanId,
  which pages child runs with take and no orderBy across both tables. Add a
  createdAt order so it takes the bounded cross-table merge (and the 50-row cap
  is now deterministic, most recent first) instead of throwing for every org.
- Key the one-time-use-token cross-table claim on the token alone (a reserved
  task slot), matching the task-independent oneTimeUseToken unique constraint,
  so a multi-task token cannot mint twice across the flip. Stop excluding
  triggerAndWait from the token claim. Always resolve a held claim on the
  success path (publish, else release) so it cannot leak until its TTL.
… shape merge

The Electric dual-shape merge was a bridge to let the Electric backend observe
v2 runs during the cutover, but Electric is short-lived and the merge taxed
every tag/batch realtime feed with a second long-poll the moment it deployed.
Gate the v2 run table on the native realtime backend instead (the native client
is table-agnostic and observes v2 runs directly), so a run only routes to
task_run_v2 once its org is on native. Remove the merge module and restore the
single-table Electric proxy.

The cross-table correctness work stays: a v2 run can still have a cross-table
parent or child once an org flips, so the cancelRun cascade, metadata
parent/root routing, the one-time-token claim, and the findRuns guard all still
apply regardless of realtime backend.
…v2 orgs, add cross-table tests

The idempotency-key dedup is a non-id predicate, so RunStore read BOTH run
tables in parallel on every idempotency-keyed trigger, including orgs not cut
over to v2 (whose runs only live in TaskRun, so the task_run_v2 query is always
empty; while native realtime is off that is every org). Add an optional
`tables: "legacy" | "both"` scope to findRun and pass "legacy" from the
idempotency concern when the org is not on v2, keeping the trigger hot path
single-table.

Backfills cross-table tests the audit flagged as missing: findRun legacy-scope
skips task_run_v2, and clearIdempotencyKey fans out across both tables
(byPredicate hits v2; a mixed byFriendlyIds array clears both).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant