Skip to content

swdotcom/sanitize-lambda

Repository files navigation

Bedrock invocation body filter (Lambda + SAM)

Customers enable Amazon Bedrock model invocation logging to an S3 bucket. This stack deploys a Lambda that copies the logs into a separate S3 bucket for third-party access while omitting the sensitive Bedrock body fields: inputBodyJson and outputBodyJson.

Architecture

flowchart LR
  Bedrock[Bedrock logging] --> Raw[(Raw S3 bucket)]
  Raw -->|ObjectCreated| Lambda[Filter Lambda]
  Lambda --> Shareable[(Shareable S3 bucket)]
  Shareable --> Vendor[Third party]
Loading
Bucket Who reads it Contents
Raw Customer only Full Bedrock JSONL logs
Shareable Third party Bedrock JSONL logs with inputBodyJson and outputBodyJson removed

Prerequisites

  • AWS SAM CLI
  • AWS CLI configured with deploy permissions
  • Bedrock model invocation logging enabled (or planned) in the same Region as this stack

Quick deploy (existing raw bucket)

Most customers already have a Bedrock logging bucket.

cd sanitize-lambda
cp samconfig.toml.example samconfig.toml
# Edit samconfig.toml: set RawLogBucketName and region

sam build
sam deploy --guided
# Or: sam deploy  (uses samconfig.toml)

./scripts/post-deploy.sh bedrock-log-filter

post-deploy.sh configures S3 event notifications on the raw bucket so new log objects trigger the Lambda.

post-deploy.sh configures S3 event notifications on the raw bucket so new log objects trigger the Lambda. the script uses aws s3api put-bucket-notification-configuration, which replaces all existing notifications on that bucket. If the raw bucket already has other triggers, you need to merge them first instead of running it as-is.

Guided deploy parameters

Parameter Typical value
CreateRawLogBucket false
RawLogBucketName Your existing Bedrock log bucket
ShareableBucketName (empty = auto-generated name)
SourcePrefix Optional filter, e.g. AWSLogs/
DestPrefix Optional, e.g. shareable
LambdaMemorySize 1024 (large JSONL lines)
LambdaTimeoutSeconds 900

Greenfield deploy (stack creates raw bucket)

sam deploy --guided \
  --parameter-overrides CreateRawLogBucket=true

./scripts/post-deploy.sh <stack-name>

After deploy, point Bedrock → Model invocation logging → S3 at the RawLogBucketName output. The post-deploy step is still required to wire S3 notifications to the Lambda. The shareable bucket name is in ShareableLogBucketName.

Backfill existing logs

Use the backfill script after deploying the stack when the raw bucket already contains Bedrock log objects:

# Uses the stack's RawLogBucketName and FilterFunctionName outputs.
./scripts/backfill-existing-logs.sh bedrock-log-filter AWSLogs/

The second argument is optional. If omitted, the script uses the stack's SourcePrefix parameter. It lists matching objects in the raw bucket and invokes the filter Lambda with an S3 event payload for each .json, .json.gz, or .gz object.

Useful environment variables:

# Preview what would be invoked without calling Lambda.
DRY_RUN=true ./scripts/backfill-existing-logs.sh bedrock-log-filter AWSLogs/

# Process only the first 25 matching objects.
LIMIT=25 ./scripts/backfill-existing-logs.sh bedrock-log-filter AWSLogs/

# Add a small pause between invokes if you want to be gentle with API limits.
SLEEP_SECONDS=0.2 ./scripts/backfill-existing-logs.sh bedrock-log-filter AWSLogs/

After backfill, keep post-deploy.sh enabled for continuous processing of new objects.

Stack outputs

Output Use
ShareableLogBucketName Grant third-party s3:GetObject / ListBucket here
RawLogBucketName Bedrock logging destination
FilterFunctionArn Debugging / manual invoke
ThirdPartyIamPolicySnippet Example read-only IAM policy
ConfigureS3NotificationScript Manual notification setup command

Third-party access

Use the ThirdPartyIamPolicySnippet output (or equivalent) so the vendor can only read the shareable bucket. Do not grant access to the raw bucket.

What is included vs omitted in the shareable bucket

Included Omitted
Original top-level fields including timestamp, accountId, identity, and modelId inputBodyJson
Original wrapper fields such as token counts and content types outputBodyJson
Full ARNs and request metadata Prompt/response payloads contained in those body fields

Input format: Bedrock writes gzip JSONL (*.json.gz). The Lambda decompresses on read and writes uncompressed *.json to the shareable bucket (.gz removed from the key).

Local development

# Unit tests
make test

# Validate template
sam validate --lint
sam build

# Invoke with sample S3 event (no AWS deploy)
sam local invoke BedrockLogFilterFunction \
  --event events/s3_put_event.json \
  --env-vars env.local.json

Create env.local.json for local invoke:

{
  "BedrockLogFilterFunction": {
    "DEST_BUCKET": "your-shareable-bucket",
    "DEST_PREFIX": "",
    "SOURCE_PREFIX": ""
  }
}

Project layout

sanitize-lambda/
  template.yaml          # SAM / CloudFormation
  samconfig.toml.example
  Makefile
  src/
    lambda_function.py   # S3 trigger handler
    sanitizer.py         # Removes sensitive body fields
  scripts/
    backfill-existing-logs.sh
    post-deploy.sh       # Wire S3 → Lambda after deploy
    configure-raw-bucket-notification.sh
  tests/
  events/

Manual packaging (without SAM)

pip install -r src/requirements.txt -t package/
cp src/*.py package/
cd package && zip -r ../function.zip .

Lambda console test event

Use an S3 Put event whose key is a real object in the raw bucket. Bedrock paths look like:

AWSLogs/123456789012/BedrockModelInvocationLogs/us-west-2/2026/04/15/23/20260415T232908706Z_e3f68b60d953662e.json.gz

See events/s3_put_event.json. After code changes, run sam build and redeploy (or upload src/ to the function).

Troubleshooting

  • NoSuchKey: The test event key must match an existing object (Bedrock uses .json.gz, not plain .jsonl).
  • Lambda never runs: Run ./scripts/post-deploy.sh — S3 notifications are configured in a post-deploy step (required for all deployments).
  • Existing S3 notifications on the raw bucket: configure-raw-bucket-notification.sh replaces the bucket’s full notification configuration. Merge with existing rules if needed.
  • Empty shareable objects: Confirm DEST_BUCKET env on the function matches ShareableLogBucketName output.
  • Timeouts: Increase LambdaTimeoutSeconds / LambdaMemorySize for very large JSONL files.
  • Bedrock bucket policy: The raw bucket must allow Bedrock service delivery in the same Region (configure in Bedrock console).

Security notes

  • Sample paths, events, and test fixtures use the placeholder account ID 123456789012, not a real AWS account.
  • Both buckets use SSE-S3 and block public access by default.
  • Review a sample of shareable output before granting vendor access; only inputBodyJson and outputBodyJson are removed.
  • Optional: add a CMK via bucket policy changes (not included in the base template).

About

This stack deploys a Lambda that copies only telemetry data into a separate S3 bucket for third-party access. Raw prompts, responses, source code, PII, tax data, and AWS identity fields are omitted from the shareable bucket

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors