feat(webapp): add RUNTIME_API_ORIGIN to decouple runner traffic from external origin#3686
feat(webapp): add RUNTIME_API_ORIGIN to decouple runner traffic from external origin#3686ThullyoCunha wants to merge 1 commit into
Conversation
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThis PR adds a new optional Estimated code review effort🎯 1 (Trivial) | ⏱️ ~4 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Devin Review found 1 new potential issue.
🐛 1 issue in files not directly in the diff
🐛 RUNTIME_API_ORIGIN not passed through in docker-compose.yml, making the feature inoperative for Docker self-hosting (hosting/docker/webapp/docker-compose.yml:45)
The env.server.ts comment at line 136-137 explicitly references ${RUNTIME_API_ORIGIN:-} passthroughs in docker-compose.yml, and the .env.example documents RUNTIME_API_ORIGIN as a user-configurable option (hosting/docker/.env.example:57). However, hosting/docker/webapp/docker-compose.yml does not include RUNTIME_API_ORIGIN in its environment: section (lines 42-85). Docker Compose only passes env vars to containers that are explicitly listed in the environment: block — values from .env are used for variable substitution in the compose file, not automatically forwarded as container env vars. As a result, Docker self-hosting users who set RUNTIME_API_ORIGIN in their .env file will find it has no effect; the webapp container never receives the value, and runners will continue using API_ORIGIN/APP_ORIGIN. The Helm chart (hosting/k8s/helm/templates/webapp.yaml:189-192) correctly handles this, but the Docker path is broken.
View 4 additional findings in Devin Review.
ebca943 to
eb32a7f
Compare
…external origin
The webapp publishes `API_ORIGIN` to runner pods as `TRIGGER_API_URL`, so
runner-to-webapp traffic flows back through whatever URL is configured for
external clients. Self-hosting behind a tracing-enabled gateway (Envoy,
Istio, kgateway, ...) breaks the parent->child run link in trigger.dev's
run-detail tree because the gateway's W3C `traceparent` rewrite on egress
overwrites the SDK's `triggerAndWait()` span id. The webapp then writes
that gateway-generated span id as the child run's `parentSpanId`, which
never reaches the trigger event store, so the child renders as an orphan
in the UI.
Operators can split the two concerns without sacrificing external auth/
callbacks/UI flows that rely on the public `API_ORIGIN`:
- Set `RUNTIME_API_ORIGIN=http://<service>.<namespace>:<port>` (k8s) or
`http://webapp:3000` (docker) to keep runner->webapp traffic on a
cluster-internal hop that bypasses the gateway.
- Leave `API_ORIGIN` on the public URL so the dashboard, magic-link
emails, waitpoint callbacks, and API `apiUrl` responses keep working
for external clients.
Scope is intentionally limited to MANAGED (deployed) runs. Dev CLI runs
keep the original `API_ORIGIN`/`APP_ORIGIN` chain so a developer running
`trigger.dev dev` from outside the cluster does not lose connectivity.
`STREAM_ORIGIN` is still honored as a dedicated stream endpoint when set;
`RUNTIME_API_ORIGIN` takes precedence over it for `TRIGGER_STREAM_URL`
so the bypass keeps streams on the same internal hop by default.
The new env is optional and falls back to `API_ORIGIN`/`APP_ORIGIN`, so
existing deployments are unaffected. An empty string is normalized to
`undefined` in the zod schema so blank `${RUNTIME_API_ORIGIN:-}`
passthroughs from caller environments do not short-circuit the fallback
chain. Helm chart and Docker Compose are wired to forward the value to
the webapp container.
Refs: triggerdotdev#2821
eb32a7f to
da151f6
Compare
Closes #2821
✅ Checklist
Summary
The webapp publishes
API_ORIGINto runner pods asTRIGGER_API_URL, so runner-to-webapp traffic flows back through whatever URL is configured for external clients. Self-hosting behind a tracing-enabled gateway (Envoy, Istio, kgateway, ...) breaks the parent->child run link in trigger.dev's run-detail tree because the gateway's W3Ctraceparentrewrite on egress overwrites the SDK'striggerAndWait()span id. The webapp then writes that gateway-generated span id as the child run'sparentSpanId, which never reaches the trigger event store, so the child renders as an orphan in the UI.We hit this on our self-hosted v4 cluster running kgateway with tracing enabled (
spawnUpstreamSpan: true). Reproduced with three rounds of SDK debug instrumentation (capturing the active span atpropagation.inject, the wireundici:request:headerspayload, and the value the webapp receives at the route level) plus a direct curl bypassing each hop until we isolated kgateway as the rewriter. Details on the investigation are in #2821 — several self-hosted users report the same symptom.This PR splits the two concerns without sacrificing external auth/callbacks/UI flows that rely on the public
API_ORIGIN:RUNTIME_API_ORIGIN=http://<service>.<namespace>:<port>(k8s) orhttp://webapp:3000(docker) to keep runner->webapp traffic on a cluster-internal hop that bypasses the gateway.API_ORIGINon the public URL so the dashboard, magic-link emails, waitpoint callbacks, and APIapiUrlresponses keep working for external clients.The new env is optional and falls back to
API_ORIGIN/APP_ORIGIN, so existing deployments are unaffected.Changes
apps/webapp/app/env.server.ts: new optionalRUNTIME_API_ORIGINenv.apps/webapp/app/v3/environmentVariables/environmentVariablesRepository.server.ts: preferRUNTIME_API_ORIGINwhen resolvingTRIGGER_API_URL/TRIGGER_STREAM_URLfor both dev and prod runner pods, falling back to the existing chain.hosting/k8s/helm/values.yaml+templates/webapp.yaml: exposewebapp.runtimeApiOrigin(defaults to empty -> existing behavior).hosting/docker/webapp/docker-compose.yml+hosting/docker/.env.example: same opt-in for docker self-host.Testing
Validated on our self-hosted v4 staging cluster (kgateway + Envoy tracing).
Before (runners going through public URL, gateway rewrites
traceparent):triggerAndWait()wrapper spanId:070bcfdd63b42d2a(confirmed via wire-levelundici:request:headersdebug)traceparentwith spanId:b8298ebb884ade7e(gateway-rewritten)TaskRun.parentSpanId = b8298ebb...(orphan: never in event store)After (with
apiOriginpointed at the in-cluster service):66fa71fda94ccdb966fa71fda94ccdb9unchangedTaskRun.parentSpanId = 66fa71fda94ccdb9matches the parent'striggerAndWait()event inTaskEventtableEnd-to-end test task at https://github.com/meistrari/trigger-self-tests/blob/main/src/tasks/link-test.ts (parent does
linkTestChild.triggerAndWait(...)).Changelog
webapp: add optionalRUNTIME_API_ORIGINenv to advertise a runner-only API origin separate fromAPI_ORIGIN. Lets self-hosted operators route runner-to-webapp traffic cluster-internally, bypassing tracing-enabled gateways that rewrite the W3Ctraceparentheader on egress and break parent-to-child run linkage in the trace tree. Optional and backward-compatible (falls back to existingAPI_ORIGIN/APP_ORIGIN).hosting/k8s/helm+hosting/docker: expose the new env viawebapp.runtimeApiOrigin(Helm) andRUNTIME_API_ORIGIN(docker-compose).Screenshots
N/A — no UI change. The visible effect is the run-detail tree correctly showing child task runs nested under their parent's
triggerAndWait()span (matching trigger.dev SaaS behavior).💯