feat(data-retention): granular PII redaction stages (input + block outputs)#5272
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
PR SummaryHigh Risk Overview Runtime: When enabled, workflow input is masked before execution; block outputs are masked before compaction, downstream state, and agent memory; streaming can drain without forwarding raw chunks until masked. Resume/run-from-block re-masks restored Presidio: New Config/API: Docs/Helm describe Presidio as a standalone service reached via Reviewed by Cursor Bugbot for commit 2175408. Bugbot is set up for automated code reviews on this repo. Configure here. |
Greptile SummaryThis PR adds granular PII redaction for workflow execution and logs. The main changes are:
Confidence Score: 5/5This looks safe to merge.
Important Files Changed
Reviews (15): Last reviewed commit: "fix(data-retention): re-mask offloaded l..." | Re-trigger Greptile |
|
@greptile review |
|
@greptile review |
…nconditionally (no fail-open)
|
@greptile review |
|
@greptile review |
…redaction # Conflicts: # apps/sim/ee/data-retention/components/data-retention-settings.tsx
| abortSignal: ctx.abortSignal, | ||
| // Propagate in-flight block-output redaction into child workflows so | ||
| // nested blocks mask outputs too (recurses: each child forwards it). | ||
| piiBlockOutputRedaction: ctx.piiBlockOutputRedaction, |
There was a problem hiding this comment.
Child workflows skip input redaction
Medium Severity
The new workflow-input PII stage runs only in executeWorkflowCore on top-level processedInput. Nested child runs are started with a direct Executor and pass childWorkflowInput unchanged. Only the block-output policy is forwarded on the context, so when the input stage is on and block outputs are off, mapped or explicit child input can execute and produce downstream state without in-flight input masking.
Reviewed by Cursor Bugbot for commit 8f86d77. Configure here.
…logs (preserve redacted content)
|
@greptile review |
| snapshot.state.blockStates = await redactObjectStrings( | ||
| snapshot.state.blockStates, | ||
| blockOutputOpts | ||
| ) |
There was a problem hiding this comment.
Refs stay unmasked When a paused run or run-from-block snapshot contains a large-value ref that was created before block-output redaction was enabled, this call only masks inline strings. Large-value refs are treated as opaque by
redactObjectStrings, so the ref still points at the original offloaded bytes. The later warm-up step can materialize that raw value for downstream blocks, letting them read or send unredacted PII even though the block-output stage is enabled.
…n logs stage is on
|
@greptile review |
| snapshot.state.blockStates = await redactObjectStrings( | ||
| snapshot.state.blockStates, | ||
| blockOutputOpts | ||
| ) |
There was a problem hiding this comment.
This restore path still only masks inline strings. When a paused run or run-from-block snapshot contains a large-value ref created before block-output redaction was enabled, redactObjectStrings leaves the ref untouched. The later warm-up can materialize that original offloaded value for downstream blocks, so the resumed workflow can read raw PII even though block-output redaction is now enabled. This path needs to hydrate, mask, and re-store restored refs before downstream state can use them.
| snapshot.state.blockStates = await redactObjectStrings( | ||
| snapshot.state.blockStates, | ||
| blockOutputOpts | ||
| ) | ||
| } |
There was a problem hiding this comment.
This restore path still leaves old offloaded block outputs unmasked. It only runs redactObjectStrings over restored blockStates, and that redactor treats large-value refs as opaque, so a paused run or run-from-block snapshot created before block-output redaction was enabled can still point at raw stored bytes. When the restored state is warmed and downstream blocks read that ref, they can receive the original PII even though the block-output stage is enabled. The restore path needs to hydrate, mask, and re-store those refs before exposing the state to execution.
… (env-tunable), remove request timeouts, sync large-value walk
…daction flag - New pii-granular-redaction feature flag (fallback PII_GRANULAR_REDACTION), layered on pii-redaction, gating the execution-altering input + block-output stages - Route returns piiGranularRedactionEnabled and rejects enabling granular stages when off - UI shows only the Logs stage tab unless the flag is on; clamps active stage - Drop the per-search Select all toggle; add a Deselect all action to the PII section header
|
@greptile review |
Presidio now runs as its own ECS service (and, in Helm, its own Deployment + Service) reached over the network via PII_URL — not a sidecar in the app task. Update README, code comments, env docs, Dockerfiles, and the Helm chart docs to match, and note the deploy requirement that PII_URL must be reachable.
…on't lock out granular saves - Resume/run-from-block restore now hydrates → masks → re-stores large-value refs in restored blockStates (not just inline strings), so a value offloaded before the block-output stage was enabled can't warm raw PII into downstream blocks. Fails fast. - pii-large-values: add onFailure mode (throw on the execution path, scrub for logs) and redactLargeValueRefsInValue for arbitrary (non-RedactablePayload) values - Granular flag gate now rejects only NEW off→on granular enablement, so orgs that already configured granular stages can still save retention settings when the flag is off
|
@greptile review |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 2175408. Configure here.


Summary
Type of Change
Testing
Tested manually. Unit tests for resolver back-compat,
redactObjectStrings+ failure modes, and the contract schema.bun run lint,check:api-validation:strict, andcheck:migrations origin/stagingall pass.Checklist