Skip to content

feat(secrets): ingest env secrets at container runtime instead of fanning into ECS taskdef#5189

Open
TheodoreSpeaks wants to merge 2 commits into
stagingfrom
feat/runtime-secrets-ingestion
Open

feat(secrets): ingest env secrets at container runtime instead of fanning into ECS taskdef#5189
TheodoreSpeaks wants to merge 2 commits into
stagingfrom
feat/runtime-secrets-ingestion

Conversation

@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator

Summary

  • ECS app/socket taskdefs were ~42KB, ~93% of which (~39KB) was the secrets[] array — 268 pointer entries, each restating the full ~78-char secret ARN. That was marching toward the 64KB taskdef limit and growing ~150 bytes per hosted key added. (Confirmed live: the secret blob itself is only 18.3KB / 268 keys — the taskdef is bigger than the data it points to, purely from ARN repetition.)
  • New @sim/runtime-secrets package: loadRuntimeSecrets() reads SIM_ENV_SECRET_ID, fetches the combined /{env}/sim/env-vars secret once via the task role, JSON.parses it, and hydrates process.envno-clobber (explicit taskdef env wins), no-op when unset (local/self-hosted unchanged), fail-fast otherwise, with one bounded retry.
  • Bootstrap entrypoints apps/sim/bootstrap.ts + apps/realtime/src/bootstrap.ts await loadRuntimeSecrets() then dynamic-import() the real server. Ordering matters because env-flags.ts reads env at module load.
  • The app bootstrap is bun build-bundled in the Dockerfile builder stage (it runs outside the Next standalone bundle, so its deps can't resolve from the pruned node_modules); realtime keeps full node_modules and runs the TS entry directly.

Deploy ordering (important)

  • This Sim image must go live BEFORE the matching infra change (separate infra PR that empties secrets[] and adds the SIM_ENV_SECRET_ID plaintext env).
  • new image + old fan-out taskdef → safe (loader no-ops, injected env vars still present)
  • old image + new taskdef → broken
  • The image is bidirectional/backward-compatible; the infra flip is the one-way switch. Roll the image, verify healthy, then flip infra.

Type of Change

  • New feature / infrastructure change (no behavior change on the current taskdef)

Testing

  • @sim/runtime-secrets unit tests (6) passing — hydrate, no-clobber, no-op-when-unset, invalid-JSON, non-object, retry-then-throw
  • Type-checks clean: @sim/runtime-secrets, apps/sim, apps/realtime (and the infra repo for the companion change)
  • bun run lint, check:api-validation:strict, check:boundaries, check:realtime-prune all clean
  • Local bun build smoke test of the app bootstrap (0.95MB, AWS SDK inlined, dynamic server import preserved) confirmed the boot ordering (hydrate → import server)
  • Remaining (deploy-time): full docker build + run against the real secret; cdk diff to confirm secrets: []

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

…ning into ECS taskdef

The app/socket ECS taskdefs were ~42KB, ~93% of which was the secrets[] array:
268 pointer entries each restating the full ~78-char secret ARN, marching toward
the 64KB taskdef limit and growing ~150 bytes per hosted key added. The secret
blob itself is only ~18KB/268 keys.

Move secret delivery to container boot: new @sim/runtime-secrets loadRuntimeSecrets()
reads SIM_ENV_SECRET_ID, fetches the combined secret once, and hydrates process.env
(no-clobber, no-op when unset, fail-fast). Bootstrap entrypoints for app + realtime
await it before importing the real server (env-flags reads env at module load). The
app bootstrap is bun-bundled in the Dockerfile builder stage since it runs outside
the Next standalone bundle; realtime keeps full node_modules and runs the TS entry.

Backward-compatible: with the current fan-out taskdef the loader no-ops and the app
reads the injected env vars unchanged. The matching infra change (empty secrets[] +
SIM_ENV_SECRET_ID) ships separately, after this image is live.
@TheodoreSpeaks TheodoreSpeaks requested a review from a team as a code owner June 23, 2026 22:11
@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

@greptile review

@vercel

vercel Bot commented Jun 23, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docs Ready Ready Preview, Comment Jun 24, 2026 1:19am

Request Review

@cursor

cursor Bot commented Jun 23, 2026

Copy link
Copy Markdown

PR Summary

High Risk
Changes container boot and how production secrets reach the app; mis-ordering deploy (infra before image) or fetch failures can prevent healthy startup, though no-clobber and no-op-when-unset preserve backward compatibility on the current taskdef.

Overview
Adds @sim/runtime-secrets with loadRuntimeSecrets(), which when SIM_ENV_SECRET_ID is set fetches one combined JSON secret from AWS Secrets Manager at boot, merges keys into process.env (no overwrite of existing vars), and fails fast on fetch/parse errors; it no-ops when the id is unset so local/self-hosted behavior stays the same.

apps/sim/bootstrap.ts and apps/realtime/src/bootstrap.ts become the container entrypoints: they await loadRuntimeSecrets() then dynamically import the real server so modules that read env at import time see hydrated config. Docker app image **bun build**s the sim bootstrap (AWS SDK inlined outside the Next standalone tree) and runs bootstrap.js; realtime CMD switches to bootstrap.ts. Both apps add the workspace dependency on @sim/runtime-secrets; unit tests cover hydrate, no-clobber, validation, and retries.

Reviewed by Cursor Bugbot for commit 77a4298. Bugbot is set up for automated code reviews on this repo. Configure here.

@greptile-apps

greptile-apps Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds a new @sim/runtime-secrets package that fetches the combined /{env}/sim/env-vars Secrets Manager secret at container boot, hydrating process.env before the application server starts — eliminating the 268-entry ARN fan-out array that was consuming ~39 KB of the 64 KB ECS task-definition limit.

  • New @sim/runtime-secrets package with loadRuntimeSecrets(): fetches the secret via sendWithRetry (up to 3 attempts, AbortSignal.timeout(5000) per request), guards binary secrets (no retry on missing SecretString), and hydrates process.env with no-clobber semantics.
  • Two new bootstrap entrypoints (apps/sim/bootstrap.ts, apps/realtime/src/bootstrap.ts) that await loadRuntimeSecrets() before dynamically importing the application server, preserving module-load-time env reads.
  • Dockerfile changes bundle the app bootstrap into a self-contained artifact (AWS SDK inlined) to avoid dependency on the pruned standalone node_modules, while the realtime Dockerfile runs TypeScript directly with Bun.

Confidence Score: 5/5

Safe to merge — the change is backward-compatible, the new image no-ops with the current fan-out task definition, and the one-way infra flip is in a separate PR.

All four issues from the prior review round have been cleanly resolved: the binary-secret guard is now outside the retry loop, each request is bounded by an AbortSignal timeout, the redundant AWS SDK pin is gone, and the binary-secret test asserts exactly one send call. The retry logic, no-clobber hydration, and Dockerfile bundling strategy are all correct.

No files require special attention.

Important Files Changed

Filename Overview
packages/runtime-secrets/src/index.ts Core implementation: binary-secret guard correctly sits outside the retry loop in fetchSecretString; each send uses AbortSignal.timeout(5000) for a per-attempt deadline; no-clobber hydration and fail-fast semantics are sound.
packages/runtime-secrets/src/index.test.ts Seven test cases covering no-op, hydration, no-clobber, invalid JSON, non-object JSON, binary secret (asserts exactly 1 send call), and retry-then-throw (asserts 3 send calls); sleep is mocked to keep tests fast.
docker/app.Dockerfile Adds a bun-bundle step for bootstrap.ts in the builder stage (AWS SDK inlined, dynamic server import preserved), copies the artifact to the runner stage, and switches CMD to bootstrap.js.
docker/realtime.Dockerfile Single-line CMD change from index.ts to bootstrap.ts; realtime keeps full node_modules so no bundling needed.
apps/sim/bootstrap.ts Minimal entrypoint: awaits loadRuntimeSecrets(), then dynamically imports server.js via a variable to avoid static bundler resolution of the Next standalone artifact.
apps/realtime/src/bootstrap.ts Minimal entrypoint: awaits loadRuntimeSecrets(), then dynamically imports @/index so Socket.IO server modules read env after hydration.
packages/runtime-secrets/package.json New private package with pinned @aws-sdk/client-secrets-manager@3.1032.0; no redundant AWS SDK pin in the consuming apps.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant D as Docker CMD
    participant B as bootstrap.ts/js
    participant L as loadRuntimeSecrets()
    participant SM as AWS Secrets Manager
    participant E as process.env
    participant S as Server (server.js / index.ts)

    D->>B: bun bootstrap.js
    B->>L: await loadRuntimeSecrets()
    L->>E: read SIM_ENV_SECRET_ID
    alt SIM_ENV_SECRET_ID not set
        L-->>B: return (no-op)
    else SIM_ENV_SECRET_ID set
        loop sendWithRetry (up to 3 attempts, 5s timeout each)
            L->>SM: GetSecretValueCommand(secretId) + AbortSignal.timeout(5000)
            alt Network/timeout error
                SM-->>L: throw Error
                L->>L: backoffWithJitter + sleep (200-2000ms)
            else Successful response
                SM-->>L: "{ SecretString? }"
            end
        end
        alt No SecretString (binary secret)
            L-->>B: throw immediately (non-retriable)
        else SecretString present
            L->>L: JSON.parse(SecretString)
            L->>L: validate: object, not array/null
            loop for each [key, value] in entries
                L->>E: "process.env[key] = value (no-clobber)"
            end
            L-->>B: return (loaded N, skipped M)
        end
    end
    B->>S: "await import('./server.js' or '@/index')"
    S-->>D: server running
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant D as Docker CMD
    participant B as bootstrap.ts/js
    participant L as loadRuntimeSecrets()
    participant SM as AWS Secrets Manager
    participant E as process.env
    participant S as Server (server.js / index.ts)

    D->>B: bun bootstrap.js
    B->>L: await loadRuntimeSecrets()
    L->>E: read SIM_ENV_SECRET_ID
    alt SIM_ENV_SECRET_ID not set
        L-->>B: return (no-op)
    else SIM_ENV_SECRET_ID set
        loop sendWithRetry (up to 3 attempts, 5s timeout each)
            L->>SM: GetSecretValueCommand(secretId) + AbortSignal.timeout(5000)
            alt Network/timeout error
                SM-->>L: throw Error
                L->>L: backoffWithJitter + sleep (200-2000ms)
            else Successful response
                SM-->>L: "{ SecretString? }"
            end
        end
        alt No SecretString (binary secret)
            L-->>B: throw immediately (non-retriable)
        else SecretString present
            L->>L: JSON.parse(SecretString)
            L->>L: validate: object, not array/null
            loop for each [key, value] in entries
                L->>E: "process.env[key] = value (no-clobber)"
            end
            L-->>B: return (loaded N, skipped M)
        end
    end
    B->>S: "await import('./server.js' or '@/index')"
    S-->>D: server running
Loading

Reviews (3): Last reviewed commit: "fix(runtime-secrets): address review fee..." | Re-trigger Greptile

Comment thread packages/runtime-secrets/src/index.ts
Comment thread packages/runtime-secrets/src/index.test.ts
@greptile-apps

greptile-apps Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR introduces a new @sim/runtime-secrets workspace package that fetches the combined /{env}/sim/env-vars AWS Secrets Manager secret at container boot and hydrates process.env, replacing the previous approach of fanning ~268 individual secret ARN pointers into the ECS task definition (which was consuming ~93% of the 64 KB rendered-document limit).

  • New package (packages/runtime-secrets): loadRuntimeSecrets() reads SIM_ENV_SECRET_ID, fetches the secret once, JSON-parses it, and writes keys into process.env with no-clobber semantics so explicit task-definition environment entries always win. Three attempts with exponential backoff are made before failing fast.
  • Bootstrap entrypoints added for both apps/sim (bundled via bun build into a self-contained bootstrap.js alongside the Next.js standalone server) and apps/realtime (run directly as TypeScript with full node_modules); both await loadRuntimeSecrets() before dynamic-importing the real server so modules that read env at import time see the full configuration.
  • Deploy ordering is critical: the new image must be rolled before the companion infra PR that empties secrets[] and adds the SIM_ENV_SECRET_ID plaintext env var; the reverse order will break.

Confidence Score: 4/5

The change is backward-compatible with the current taskdef — when SIM_ENV_SECRET_ID is absent the loader is a no-op, so rolling the image before the infra flip is safe. The main caution is that a binary-secret misconfiguration burns unnecessary retry delays before failing, and there is no per-request timeout on the Secrets Manager client.

The bootstrap ordering, no-clobber hydration, and bundling strategy are all sound. The findings are limited to the retry loop catching its own internal guard error (causing avoidable backoff on certain misconfigurations) and a missing SDK request timeout that could slow crash-detection in network-degraded environments. Neither causes incorrect behavior in the normal path.

packages/runtime-secrets/src/index.ts — the fetchSecretString retry loop and SecretsManagerClient instantiation.

Important Files Changed

Filename Overview
packages/runtime-secrets/src/index.ts Core secret-loading logic; no-clobber hydration and retry are correct, but the binary-secret branch is inadvertently retried and there is no request timeout on the SDK client.
packages/runtime-secrets/src/index.test.ts Six focused unit tests covering the main happy/sad paths; missing coverage for the binary-secret (no SecretString) case and its interaction with the retry loop.
docker/app.Dockerfile Adds bun build step to produce a self-contained bootstrap.js; copies it alongside server.js in the runner stage; CMD updated correctly.
docker/realtime.Dockerfile Minimal one-line CMD change to bootstrap.ts; no bundling required since full node_modules is available at runtime.
apps/sim/bootstrap.ts Clean entrypoint; variable-held specifier correctly prevents bun from statically bundling server.js while preserving the runtime dynamic import.
apps/realtime/src/bootstrap.ts Straightforward two-line entrypoint; ordering is correct — secrets are hydrated before the Socket.IO server module loads.
apps/realtime/package.json Adds @sim/runtime-secrets dependency (correct), but also redundantly pins @aws-sdk/client-secrets-manager directly when it is already a transitive dep through the new package.
packages/runtime-secrets/package.json New workspace package declaration; exports, engines, and scripts are correct; test and type-check tooling properly configured.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Docker as Container Runtime
    participant Bootstrap as bootstrap.js / bootstrap.ts
    participant RSP as loadRuntimeSecrets()
    participant SM as AWS Secrets Manager
    participant Server as server.js / index.ts

    Docker->>Bootstrap: "CMD bun bootstrap.*"
    Bootstrap->>RSP: await loadRuntimeSecrets()
    RSP->>RSP: read SIM_ENV_SECRET_ID from process.env
    alt SIM_ENV_SECRET_ID not set (local/self-hosted)
        RSP-->>Bootstrap: no-op return
    else SIM_ENV_SECRET_ID is set (ECS)
        RSP->>SM: "GetSecretValue({ SecretId })"
        SM-->>RSP: SecretString (JSON blob)
        RSP->>RSP: JSON.parse → Object.entries
        RSP->>RSP: hydrate process.env (no-clobber)
        RSP-->>Bootstrap: done (loaded N, skipped M)
    end
    Bootstrap->>Server: await import(standaloneServer / index)
    Server-->>Docker: listening on PORT
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Docker as Container Runtime
    participant Bootstrap as bootstrap.js / bootstrap.ts
    participant RSP as loadRuntimeSecrets()
    participant SM as AWS Secrets Manager
    participant Server as server.js / index.ts

    Docker->>Bootstrap: "CMD bun bootstrap.*"
    Bootstrap->>RSP: await loadRuntimeSecrets()
    RSP->>RSP: read SIM_ENV_SECRET_ID from process.env
    alt SIM_ENV_SECRET_ID not set (local/self-hosted)
        RSP-->>Bootstrap: no-op return
    else SIM_ENV_SECRET_ID is set (ECS)
        RSP->>SM: "GetSecretValue({ SecretId })"
        SM-->>RSP: SecretString (JSON blob)
        RSP->>RSP: JSON.parse → Object.entries
        RSP->>RSP: hydrate process.env (no-clobber)
        RSP-->>Bootstrap: done (loaded N, skipped M)
    end
    Bootstrap->>Server: await import(standaloneServer / index)
    Server-->>Docker: listening on PORT
Loading

Reviews (2): Last reviewed commit: "feat(secrets): ingest env secrets at con..." | Re-trigger Greptile

Comment thread packages/runtime-secrets/src/index.ts
Comment thread apps/realtime/package.json
Comment thread packages/runtime-secrets/src/index.ts
- Move the binary-secret guard outside the retry loop (sendWithRetry) so a
  missing SecretString throws immediately instead of burning 3 attempts + backoff.
- Bound each Secrets Manager request with AbortSignal.timeout(5s) so a stalled
  response can't hang boot indefinitely.
- Drop the redundant @aws-sdk/client-secrets-manager pin from apps/realtime; it
  resolves transitively via @sim/runtime-secrets.
- Add a test for the non-retriable binary-secret path.
@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

Addressed all four in 77a4298:

  • Binary-secret retried — split out sendWithRetry(); the !SecretString guard now runs after a successful send, so a binary secret throws immediately (no wasted retries/backoff, keeps the 'binary secrets are not supported' message).
  • Request timeout — each send now passes abortSignal: AbortSignal.timeout(5000), so a stalled response can't hang boot. (Used AbortSignal.timeout rather than NodeHttpHandler to avoid pulling in @smithy/node-http-handler — both apps run on Bun, which supports it.)
  • Redundant @aws-sdk pin in apps/realtime — removed; verified it still resolves transitively via @sim/runtime-secrets (runtime import + type-check both pass).
  • Missing binary-secret test — added throws immediately on a binary secret (no SecretString), without retrying (asserts a single send call).

@greptile review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant