feat(webapp): mollifier trigger-time decisions — mollify, claim, read fallback#3753
feat(webapp): mollifier trigger-time decisions — mollify, claim, read fallback#3753d-cs wants to merge 15 commits into
Conversation
|
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
5a7bc19 to
baa6f17
Compare
01f3958 to
449a0bc
Compare
e85e771 to
9298706
Compare
449a0bc to
ffe51b8
Compare
9298706 to
f4c5b21
Compare
cae33fa to
16bfff0
Compare
f126737 to
36cc024
Compare
Addresses Devin's review on PR #3753 (idempotencyKeys.server.ts comment 2026-05-28): when TRIGGER_MOLLIFIER_ENABLED=1 globally during staged rollout, claimOrAwait was issuing a Redis SETNX for every idempotency-keyed trigger — including orgs that had not opted in. Those orgs gain nothing from the claim (the gate always returns pass_through for them, so the buffer is never written to) and PG unique constraint already deduplicates same-key races on the pass-through path. Wrap the claim block in an additional per-org check using the same in-memory Organization.featureFlags predicate that evaluateGate uses (makeResolveMollifierFlag) — no DB query. Non-opted-in orgs now skip claimOrAwait entirely; opted-in orgs still get the cross-store race coordination. Regression test added in mollifierClaimResolution.test.ts: with orgFlag=false the concern returns isCached:false with no claim, without calling claimIdempotency at all. The two prior tests now opt the fake org in (organization.featureFlags.mollifierEnabled=true) so they still exercise the resolution branches. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses two Devin findings on PR #3753 (readFallback.server.ts:206 comments 2026-05-28). 1) batchId field-name mismatch: #buildEngineTriggerInput in triggerTask.server.ts writes batch info as a nested object 'batch: { id, index }', not as a flat 'batchId'. readFallback was reading the non-existent snapshot.batchId so SyntheticRun.batchId was always undefined for buffered runs. Now reads snapshot.batch.id (the internal cuid, matching what PG stores in TaskRun.batchId). 2) parentTaskRunFriendlyId / rootTaskRunFriendlyId structurally unfillable: the snapshot carries the INTERNAL parent/root ids (parentTaskRunId / rootTaskRunId, what engine.trigger consumes), not friendlyIds. Convert internal to friendly via RunId.toFriendlyId so consumers do not have to special-case the buffered path. Four regression tests in mollifierReadFallback.test.ts: batchId extracted from nested snapshot.batch.id; a flat snapshot.batchId key is ignored (belt-and-braces); parent/root friendly conversion round-trips through RunId.generate(); parent/root undefined when the snapshot has no parent context. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Pushed 2 follow-up fixes:
Verified end-to-end against a buffered run; webapp typecheck clean. |
870bf6e to
e9d3e62
Compare
… fallback The trigger hot path's mollifier integration: - `mollifyTrigger`: when the gate trips, write the engine.trigger snapshot to the buffer and return a synthesised QUEUED response. - Pre-gate idempotency-key claim: same-key triggers serialise through Redis so a burst lands in PG / buffer exactly once. - Read-fallback extensions: `findRunByIdWithMollifierFallback` for the trigger-time idempotency lookup that must see buffered runs. - Gate bypasses: debounce, oneTimeUseToken, parentTaskRun (triggerAndWait) skip the mollify path entirely. - triggerTask + IdempotencyKeyConcern wired to the above. Stacked on buffer extensions PR. All behaviour gated by the master `TRIGGER_MOLLIFIER_ENABLED` switch; off-state hot path is unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`new Date("not-a-date")` returns a truthy Invalid Date object, which would
mis-classify the run as CANCELED in the read-fallback synthesised shape.
Add an `asDate` helper that rejects NaN-valued parses and use it for
`cancelledAt` and `delayUntil`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The buffer's claim API now requires a caller-supplied ownership token so compare-and-act protects the slot against a stale predecessor. Wires the token end-to-end: - `claimOrAwait` generates the token (UUID) up front and reuses it across the retry path; returns it on the `claimed` outcome. - `publishClaim` and `releaseClaim` wrappers accept and forward the token to the buffer. - `ClaimedIdempotency` carries the token so the trigger pipeline can publish or release with the same token it claimed under. - `triggerTask.server.ts` threads the token into the publish call. Tests pin token round-trip: claimOrAwait → claimIdempotency, plus the publish and release pass-through. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t leak fix)
Devin review flagged a security + correctness bug: the mollify path
returned \`MollifySyntheticResult\` with run shape
\`{ friendlyId, spanId }\` and the call site cast it to
\`TriggerTaskServiceResult\` (which expects \`run: TaskRun\` with an
\`id: string\`). Downstream, the trigger route calls
\`saveRequestIdempotency(requestKey, "trigger", result.run.id)\` — so
the cache stored \`undefined\` as the entity id. On SDK retry the
request-idempotency flow then ran
\`prisma.taskRun.findFirst({ where: { id: undefined } })\`. Prisma
strips \`undefined\` from where clauses, so the query degenerates to an
unfiltered \`findFirst\` and returns an arbitrary TaskRun row —
potentially from another env / user.
Fix:
- Add \`id: string\` to \`MollifySyntheticResult.run\`.
- Compute it via \`RunId.fromFriendlyId(...)\` on both branches:
the happy-accept path (id from \`args.runFriendlyId\`) and the
\`duplicate_idempotency\` race-loser path (id from
\`result.existingRunId\` so the response carries the WINNER's id).
- Add regression tests pinning both branches.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…uard - readFallback: read snapshot.taskVersion (the key buildEngineTriggerInput writes) instead of the nonexistent snapshot.lockToVersion, so buffered version-locked runs report their locked version; test now uses the real key as a regression guard. - env: TRIGGER_MOLLIFIER_TRIP_THRESHOLD back to positive() (matching sibling mollifier numerics) to forbid threshold=0 silently mollifying every trigger. - idempotencyKeys: document why the resolved-but-unfindable fall-through is safe (PG-unique + accept SETNX dedup + ~30s claim TTL self-heal); add regression test pinning the fall-through and the resolved-and-findable cached-hit path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Remove plan-tracking shorthand (Q#, C#, F#, Phase N, _plans/) from trigger-layer mollifier comments and test names; reword to plain English. Comment/test-name only; no behaviour change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…f-state token alloc triggerTask.server.ts: mollifier.buffered logs at debug, not info — it fires per mollified trigger during a burst, exactly when info-level would flood aggregation. idempotencyClaim.server.ts: move the buffer-null fall-open check above token generation so the mollifier-OFF path skips the default randomUUID() for idempotency-keyed triggers (triggerTask is the highest-throughput path). Injected test generators still honoured. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nt, docs URL
idempotencyKeys.server.ts: reword 'bypasses the gate via F4' to 'bypasses the gate entirely' — last residual internal planning-label reference I missed in the earlier cleanup pass.
triggerTask.server.ts: reindent the startSpan arrow-function body to conventional depth (+4 sp on lines 143-619). The body was under-indented relative to the 'async (span) => {' opener after the try-wrapper for claim resolution was added around it; closes (arrow at 8sp, startSpan at 6sp) were already conventional, so only the body needed shifting. Pure whitespace — diff stat is balanced 413/413.
mollifierMollify.server.ts: docs URL → https://trigger.dev/docs/management/tasks/batch-trigger. The previous burst-handling anchor doesn't exist on the docs site.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses Devin's review on PR #3753 (idempotencyKeys.server.ts comment 2026-05-28): when TRIGGER_MOLLIFIER_ENABLED=1 globally during staged rollout, claimOrAwait was issuing a Redis SETNX for every idempotency-keyed trigger — including orgs that had not opted in. Those orgs gain nothing from the claim (the gate always returns pass_through for them, so the buffer is never written to) and PG unique constraint already deduplicates same-key races on the pass-through path. Wrap the claim block in an additional per-org check using the same in-memory Organization.featureFlags predicate that evaluateGate uses (makeResolveMollifierFlag) — no DB query. Non-opted-in orgs now skip claimOrAwait entirely; opted-in orgs still get the cross-store race coordination. Regression test added in mollifierClaimResolution.test.ts: with orgFlag=false the concern returns isCached:false with no claim, without calling claimIdempotency at all. The two prior tests now opt the fake org in (organization.featureFlags.mollifierEnabled=true) so they still exercise the resolution branches. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses two Devin findings on PR #3753 (readFallback.server.ts:206 comments 2026-05-28). 1) batchId field-name mismatch: #buildEngineTriggerInput in triggerTask.server.ts writes batch info as a nested object 'batch: { id, index }', not as a flat 'batchId'. readFallback was reading the non-existent snapshot.batchId so SyntheticRun.batchId was always undefined for buffered runs. Now reads snapshot.batch.id (the internal cuid, matching what PG stores in TaskRun.batchId). 2) parentTaskRunFriendlyId / rootTaskRunFriendlyId structurally unfillable: the snapshot carries the INTERNAL parent/root ids (parentTaskRunId / rootTaskRunId, what engine.trigger consumes), not friendlyIds. Convert internal to friendly via RunId.toFriendlyId so consumers do not have to special-case the buffered path. Four regression tests in mollifierReadFallback.test.ts: batchId extracted from nested snapshot.batch.id; a flat snapshot.batchId key is ignored (belt-and-braces); parent/root friendly conversion round-trips through RunId.generate(); parent/root undefined when the snapshot has no parent context. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tency expiry PG-resident path enforces `idempotencyKeyExpiresAt`: when an existing PG row is found, the lookup compares its `expiresAt` against now, clears the key on expiry, and lets a new run go through. The buffered path was missing this — `findBufferedRunWithIdempotency` returned any buffered run whose snapshot carried the key, regardless of how long ago the customer's TTL had elapsed. `idempotencyKeyTTL: "2s"` plus a second trigger 4s later would return the original buffered runId. Two layers needed the fix: 1. **Read-side (this concern).** Surface `idempotencyKeyExpiresAt` on `SyntheticRun` (the snapshot already stores it; `findRunByIdWithMollifierFallback` just wasn't exposing it). In `findBufferedRunWithIdempotency`, apply the same `expiresAt < new Date()` check as the PG path and return null on expiry. 2. **Write-side (the buffer's accept dedupe).** Returning null from step 1 isn't enough: the trigger pipeline then proceeds to `mollifyTrigger`, whose `buffer.accept` Lua dedupes by `(envId, taskIdentifier, idempotencyKey)` via SETNX on the same `mollifier:idempotency:*` key and would still echo the stale runId as `duplicate_idempotency`. On expiry, clear the buffer-side idempotency binding via `buffer.resetIdempotency` — the same primitive `ResetIdempotencyKeyService` uses for the explicit reset-via-API path. The next accept then goes through as a fresh trigger. Verified end-to-end: with mollifier active and `idempotencyKeyTTL: "2s"`, a same-key retrigger after 4s returns a new runId; same-key retriggers within the TTL still dedupe to the original. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Batch triggers crash with HTTP 500 "Foreign key constraint violated on
the constraint BatchTaskRunItem_taskRunId_fkey" whenever the mollifier
gate is active. The crash chain:
1. batchTriggerV3 generates a run id per item, calls
`triggerTaskService.call` with `options.batchId` set.
2. The mollifier gate trips, the item is buffered, the mollify response
returns a stripped run shape `{ id, friendlyId, spanId }` with no
PG row.
3. batchTriggerV3 then tries `prisma.batchTaskRunItem.create({
taskRunId: result.run.id, ... })` — the FK to TaskRun.id fails
because no row was written.
The proper fix requires both ends: skip the join-row create at
trigger-time AND create it on the drainer-side materialise. The
drainer fix is non-trivial (the drainer / replay PR territory, not
the dashboard PR) and the snapshot already carries
`batch: { id, index }` so it has the info — but until that's wired
through to a `BatchTaskRunItem.create` call on materialise, skipping
trigger-time would silently lose the batch <-> run link forever and
break batch progress reporting + `batchTriggerAndWait` parent
resumption.
Short-circuit the gate when `options.batchId` is set so batch items
always go straight to PG. Batch triggers lose the burst-protection
benefit of the mollifier; single triggers and triggerAndWait are
unaffected. Removing this bypass is the natural follow-up once the
drainer-side BatchTaskRunItem path lands in the appropriate earlier PR.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
68db676 to
0102c05
Compare
| const result = await mollifyTrigger({ | ||
| runFriendlyId, | ||
| environmentId: environment.id, | ||
| organizationId: environment.organizationId, | ||
| engineTriggerInput, | ||
| decision: mollifierOutcome.decision, | ||
| buffer: mollifierBuffer, | ||
| // Idempotency-key triple wires the buffer's SETNX into | ||
| // the trigger-time dedup symmetric with PG. | ||
| idempotencyKey, | ||
| taskIdentifier: taskId, | ||
| }); | ||
|
|
||
| logger.debug("mollifier.buffered", { | ||
| runId: runFriendlyId, | ||
| envId: environment.id, | ||
| orgId: environment.organizationId, | ||
| taskId, | ||
| reason: mollifierOutcome.decision.reason, | ||
| }); | ||
|
|
||
| // Synthetic result is structurally narrower than the full | ||
| // TaskRun; the route handler only reads | ||
| // `result.run.friendlyId`. traceRun flushes the PARTIAL | ||
| // run-span event to ClickHouse on callback return. | ||
| return result as unknown as TriggerTaskServiceResult; |
There was a problem hiding this comment.
🚩 Request-level idempotency (x-trigger-request-idempotency-key) is degraded for mollified runs
When a trigger is mollified, result.run.id is a synthetic ID from RunId.fromFriendlyId() that doesn't correspond to a PG row. The route handler (api.v1.tasks.$taskId.trigger.ts:137) calls saveRequestIdempotency(requestIdempotencyKey, 'trigger', result.run.id). On an SDK retry (lost HTTP response), handleRequestIdempotency finds this cached ID but prisma.taskRun.findFirst({ where: { id: cachedRequestId } }) returns null (no PG row), so it falls through to a fresh trigger attempt.
For triggers WITH a task-level idempotency key, the retry is still correctly deduplicated via the buffer's lookupIdempotency SETNX in findBufferedRunWithIdempotency (idempotencyKeys.server.ts:171-181). For triggers WITHOUT a task-level idempotency key, a lost-response retry during the buffer window could produce a duplicate buffer entry. This is a narrow edge case (requires lost response + no idempotency key + buffer window) and is bounded by the drainer's eventual materialization.
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
Fixed in 44286e0 on the trigger branch. Surfaced the divert as a typed isMollified?: boolean on TriggerTaskServiceResult (with a comment block explaining the contract), and the route now gates saveRequestIdempotency on !result.isMollified. Lost-response retries during the buffer window are accepted as fresh triggers; task-level idempotency keys still dedupe via the buffer SETNX, and the behaviour is bounded by drainer materialisation as you noted. Regression locked by two new assertions in triggerTask.test.ts — mollify path asserts isMollified === true, pass-through asserts it's falsy.
debounce and oneTimeUseToken triggers always return pass_through from the mollifier gate (`mollifierGate.server.ts:158-175`), so the pre-gate claim was issuing an unnecessary Redis SETNX on the trigger hot path for these trigger types. Mirror the gate's bypass list — exclude debounce + oneTimeUseToken alongside the existing resumeParentOnCompletion check — so the claim's RTT only fires when the gate could actually mollify the request. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`saveRequestIdempotency` caches the runId returned by the trigger service against the x-trigger-request-idempotency-key header. When the gate mollifies the trigger, that runId is a synthesised cuid with no PG row; a lost-response SDK retry would lookup the cached id in `handleRequestIdempotency`, miss in PG, fall through to a fresh trigger attempt, and — for triggers without a task-level idempotency key — produce a duplicate buffer entry. Surface the divert as a typed `isMollified` flag on `TriggerTaskServiceResult` and gate the route's `saveRequestIdempotency` call on it. Retries during the buffer window are now accepted as fresh triggers; task-level idempotency keys still dedupe via the buffer's SETNX in `findBufferedRunWithIdempotency`. The behaviour is bounded by the drainer's eventual materialisation — once the PG row lands, normal request-idempotency from that point forward works as usual. Regression covered by two assertions in the existing mollifier tests: the mollify path returns `isMollified: true`, and the pass-through path leaves the flag falsy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
The trigger hot path's mollifier integration:
mollifyTrigger: when the gate trips, write the engine.trigger snapshot to the buffer and return a synthesised QUEUED response. Postgres write is deferred to drainer-replay (next PR in the stack).findRunByIdWithMollifierFallbackfor the trigger-time idempotency lookup that must see buffered runs.debounce,oneTimeUseToken,parentTaskRunId/triggerAndWaitskip the mollify path entirely.triggerTask+IdempotencyKeyConcernwired to the above.All behaviour gated by the master
TRIGGER_MOLLIFIER_ENABLEDswitch; off-state hot path is unchanged (the gate is not even consulted).Stacked on the buffer extensions PR.
Test plan
Ship-gate follow-up fixes
BatchTaskRunItem_taskRunId_fkeyFK violation on batch triggers when the gate trips. End-state is a drainer-sideBatchTaskRunItemcreate-on-materialise; batch traffic passes through the gate until that lands.mollifier:idempotency:*SETNX binding (write-side) so a re-trigger past the customer's TTL lands as a fresh run instead of echoing the stale buffered runId.