Skip to content

feat(data-retention): granular PII redaction stages (input + block outputs)#5272

Merged
TheodoreSpeaks merged 21 commits into
stagingfrom
feat/pii-granular-redaction
Jul 2, 2026
Merged

feat(data-retention): granular PII redaction stages (input + block outputs)#5272
TheodoreSpeaks merged 21 commits into
stagingfrom
feat/pii-granular-redaction

Conversation

@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator

Summary

  • Add two execution-altering PII redaction stages alongside the existing log redaction: redact the workflow input before execution, and mask every block output in-flight before the next block reads it
  • Per-stage policy (entity types + language) for each of Logs / Workflow input / Block outputs; resolved most-specific-wins per workspace, with full back-compat for existing logs-only rules
  • In-flight stages fail-fast (abort the run) on a Presidio error instead of scrubbing or leaking; the logs stage keeps scrub-to-marker
  • Reuse the shared HTTP → Presidio path; block-output redaction runs before payload compaction so offloaded large values are still masked
  • Settings UI: chip-tabs across the three stages, language-first picker with the entity grid filtered to that language's recognizers, and a confirmation before removing a workspace override

Type of Change

  • New feature

Testing

Tested manually. Unit tests for resolver back-compat, redactObjectStrings + failure modes, and the contract schema. bun run lint, check:api-validation:strict, and check:migrations origin/staging all pass.

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel

vercel Bot commented Jun 29, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Jul 1, 2026 11:56pm

Request Review

@cursor

cursor Bot commented Jun 29, 2026

Copy link
Copy Markdown

PR Summary

High Risk
Changes how workflows execute (input/output masking, streaming, resume) and central PII persistence behavior; Presidio outages abort runs when in-flight stages are on.

Overview
Adds per-stage PII policy (logs, workflow input, block outputs) with entity types and language per stage, while legacy flat rules still resolve as logs-only.

Runtime: When enabled, workflow input is masked before execution; block outputs are masked before compaction, downstream state, and agent memory; streaming can drain without forwarding raw chunks until masked. Resume/run-from-block re-masks restored blockStates, including large-value refs (hydrate → mask → re-store). In-flight stages use onFailure: 'throw'; log persist keeps scrub-to-marker and applies the logs stage from stored rules without re-checking the feature flag.

Presidio: New /analyze_batch and /anonymize_batch; the TS client batches via shared byte/count chunking and bounded concurrency (optional PII_MASK_CHUNK_CONCURRENCY).

Config/API: pii-granular-redaction gates new input/blockOutputs enablement on save; Data Retention UI uses stage tabs and language-filtered entity pickers.

Docs/Helm describe Presidio as a standalone service reached via PII_URL, not an app sidecar.

Reviewed by Cursor Bugbot for commit 2175408. Bugbot is set up for automated code reviews on this repo. Configure here.

Comment thread apps/sim/executor/execution/block-executor.ts
@greptile-apps

greptile-apps Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds granular PII redaction for workflow execution and logs. The main changes are:

  • New input, block-output, and log redaction stages.
  • Per-stage policy resolution for org and workspace rules.
  • Fail-fast execution-time masking for workflow input and block outputs.
  • Large-value and streaming-output handling before downstream use.
  • Settings UI updates for stage tabs, language filtering, and override removal.

Confidence Score: 5/5

This looks safe to merge.

  • No blocking issues found in the changed code.
  • Stored PII rules are resolved without the execution-time feature flag.
  • Empty enabled stage policies are rejected or normalized consistently.
  • Restored large-value refs are masked before downstream state can use them.

Important Files Changed

Filename Overview
apps/sim/lib/workflows/executor/execution-core.ts Resolves stored PII policy during execution, masks workflow input, and re-masks restored block state before hydration.
apps/sim/lib/billing/retention.ts Resolves PII redaction per stage and keeps empty entity selections disabled.
apps/sim/lib/logs/execution/pii-large-values.ts Adds hydration, masking, and re-storage for offloaded large values.
apps/sim/executor/execution/block-executor.ts Masks block outputs before compaction and buffers streaming output when block-output redaction is enabled.
apps/sim/lib/api/contracts/primitives.ts Validates per-stage PII policy shape so enabled stages include selected entity types.
apps/sim/ee/data-retention/components/data-retention-settings.tsx Updates the settings UI for per-stage PII redaction and synced save behavior.

Reviews (15): Last reviewed commit: "fix(data-retention): re-mask offloaded l..." | Re-trigger Greptile

Comment thread apps/sim/lib/workflows/executor/execution-core.ts Outdated
Comment thread apps/sim/executor/execution/block-executor.ts
Comment thread apps/sim/lib/workflows/executor/execution-core.ts
@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

@greptile review

Comment thread apps/sim/lib/workflows/executor/execution-core.ts Outdated
@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

@greptile review

Comment thread apps/sim/executor/execution/block-executor.ts Outdated
Comment thread apps/sim/lib/workflows/executor/execution-core.ts Outdated
@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

@greptile review

@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

@greptile review

…redaction

# Conflicts:
#	apps/sim/ee/data-retention/components/data-retention-settings.tsx
Comment thread apps/sim/app/api/organizations/[id]/data-retention/route.ts
abortSignal: ctx.abortSignal,
// Propagate in-flight block-output redaction into child workflows so
// nested blocks mask outputs too (recurses: each child forwards it).
piiBlockOutputRedaction: ctx.piiBlockOutputRedaction,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Child workflows skip input redaction

Medium Severity

The new workflow-input PII stage runs only in executeWorkflowCore on top-level processedInput. Nested child runs are started with a direct Executor and pass childWorkflowInput unchanged. Only the block-output policy is forwarded on the context, so when the input stage is on and block outputs are off, mapped or explicit child input can execute and produce downstream state without in-flight input masking.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 8f86d77. Configure here.

@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

@greptile review

Comment thread apps/sim/lib/logs/execution/logger.ts Outdated
Comment on lines +689 to +692
snapshot.state.blockStates = await redactObjectStrings(
snapshot.state.blockStates,
blockOutputOpts
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 security Refs stay unmasked When a paused run or run-from-block snapshot contains a large-value ref that was created before block-output redaction was enabled, this call only masks inline strings. Large-value refs are treated as opaque by redactObjectStrings, so the ref still points at the original offloaded bytes. The later warm-up step can materialize that raw value for downstream blocks, letting them read or send unredacted PII even though the block-output stage is enabled.

@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

@greptile review

Comment on lines +689 to +692
snapshot.state.blockStates = await redactObjectStrings(
snapshot.state.blockStates,
blockOutputOpts
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 security Refs stay raw

This restore path still only masks inline strings. When a paused run or run-from-block snapshot contains a large-value ref created before block-output redaction was enabled, redactObjectStrings leaves the ref untouched. The later warm-up can materialize that original offloaded value for downstream blocks, so the resumed workflow can read raw PII even though block-output redaction is now enabled. This path needs to hydrate, mask, and re-store restored refs before downstream state can use them.

@waleedlatif1 waleedlatif1 deleted the branch staging July 1, 2026 05:43
@waleedlatif1 waleedlatif1 reopened this Jul 1, 2026
Comment on lines +689 to +693
snapshot.state.blockStates = await redactObjectStrings(
snapshot.state.blockStates,
blockOutputOpts
)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 security Large refs remain raw

This restore path still leaves old offloaded block outputs unmasked. It only runs redactObjectStrings over restored blockStates, and that redactor treats large-value refs as opaque, so a paused run or run-from-block snapshot created before block-output redaction was enabled can still point at raw stored bytes. When the restored state is warmed and downstream blocks read that ref, they can receive the original PII even though the block-output stage is enabled. The restore path needs to hydrate, mask, and re-store those refs before exposing the state to execution.

… (env-tunable), remove request timeouts, sync large-value walk
…daction flag

- New pii-granular-redaction feature flag (fallback PII_GRANULAR_REDACTION),
  layered on pii-redaction, gating the execution-altering input + block-output stages
- Route returns piiGranularRedactionEnabled and rejects enabling granular stages when off
- UI shows only the Logs stage tab unless the flag is on; clamps active stage
- Drop the per-search Select all toggle; add a Deselect all action to the PII section header
@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

@greptile review

Comment thread apps/sim/lib/workflows/executor/execution-core.ts
Presidio now runs as its own ECS service (and, in Helm, its own Deployment +
Service) reached over the network via PII_URL — not a sidecar in the app task.
Update README, code comments, env docs, Dockerfiles, and the Helm chart docs to
match, and note the deploy requirement that PII_URL must be reachable.
Comment thread apps/sim/app/api/organizations/[id]/data-retention/route.ts
…on't lock out granular saves

- Resume/run-from-block restore now hydrates → masks → re-stores large-value refs
  in restored blockStates (not just inline strings), so a value offloaded before the
  block-output stage was enabled can't warm raw PII into downstream blocks. Fails fast.
- pii-large-values: add onFailure mode (throw on the execution path, scrub for logs)
  and redactLargeValueRefsInValue for arbitrary (non-RedactablePayload) values
- Granular flag gate now rejects only NEW off→on granular enablement, so orgs that
  already configured granular stages can still save retention settings when the flag is off
@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

@greptile review

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 2175408. Configure here.

Comment thread apps/sim/lib/guardrails/validate_pii.ts
@TheodoreSpeaks TheodoreSpeaks merged commit 69b81a6 into staging Jul 2, 2026
17 checks passed
@TheodoreSpeaks TheodoreSpeaks deleted the feat/pii-granular-redaction branch July 2, 2026 01:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants