Skip to content

design-proposal: external database exposure via TLS-passthrough (SNI) and end-to-end TLS#20

Open
Aleksei Sviridkin (lexfrei) wants to merge 2 commits into
mainfrom
design/external-database-exposure
Open

design-proposal: external database exposure via TLS-passthrough (SNI) and end-to-end TLS#20
Aleksei Sviridkin (lexfrei) wants to merge 2 commits into
mainfrom
design/external-database-exposure

Conversation

@lexfrei

@lexfrei Aleksei Sviridkin (lexfrei) commented Jun 24, 2026

Copy link
Copy Markdown

Adds the external-exposure design proposal for managed databases — the WS4/WS5 half of epic cozystack/cozystack#2811. The certificate/PKI half is the sibling proposal design-proposals/unified-tls-pki.

Databases are exposed through the Gateway API TLS-passthrough listeners Cozystack already operates, routed by SNI on each engine's native port. Two outcomes:

  • IPv4 consolidation: N external databases per tenant collapse onto one LoadBalancer IP instead of one IP each.
  • End-to-end TLS for free: a passthrough listener does not terminate, so the client validates the same operator-issued certificate served inside the cluster — no second cert, no re-encryption, no edge-held key. This is the recorded decision End-to-end / transparent TLS for databases cozystack#2816 asks for (application-level TLS over edge termination).

Includes a per-engine matrix (Redis fits cleanest; Postgres needs direct-TLS; sharded Mongo and MariaDB opt-in; Kafka and non-sharded Mongo deferred for per-broker/per-member topology) and a minimal API extension (TLSPassthroughBackends on TenantGateway).

Status: Draft.

Summary by CodeRabbit

  • Documentation
    • Added a design proposal for exposing databases externally through a shared Gateway endpoint with end-to-end TLS passthrough.
    • Included connection guidance, engine-specific compatibility notes, upgrade/rollback behavior, and security/failure handling details.
    • Documented rollout considerations, test plans, and open questions for the new exposure pattern.

@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds a new design proposal document (design-proposals/external-database-exposure/README.md) describing consolidation of multiple externally exposed managed database instances behind a single per-tenant Gateway LoadBalancer IP using Gateway API TLSRoute SNI-based passthrough, covering engine fit, API/controller extensions, security semantics, and rollout plan.

Changes

External Database Exposure via Gateway API TLS-passthrough Design Proposal

Layer / File(s) Summary
Proposal metadata, scope, and background
design-proposals/external-database-exposure/README.md
Introduces document metadata, core overview of SNI/passthrough exposure, dependency references to sibling proposals, background constraints (existing TLS-passthrough, one-LB-per-database behavior, certificate/SAN hooks, IP sharing limits), problem statement, goals, and non-goals.
Detailed design: listener model, SNI routing, and engine fit matrix
design-proposals/external-database-exposure/README.md
Specifies the consolidated per-tenant Gateway with one passthrough listener per engine type, per-release TLSRoute mapping on native ports, native-port rationale, load-bearing Postgres sslnegotiation=direct requirement, and an engine fit matrix with explicit deferrals for Kafka and non-sharded MongoDB.
API extension, trust/SAN flow, security semantics, and test coverage
design-proposals/external-database-exposure/README.md
Defines the structured sharedPassthroughListener API field alongside existing TLSPassthroughServices for backward compatibility, rendering rules for shared listeners and per-release TLSRoutes, trust-anchor/SAN flow, user connection recipes, upgrade/rollback compatibility, fail-closed semantics (missing/mis-emitted SNI), admission rejection for tls.enabled: false with external: true, and planned test coverage.
Rollout phases, open questions, and alternatives
design-proposals/external-database-exposure/README.md
Documents phased rollout with validation gates, open questions (connector conformance, Postgres direct-TLS client floor, non-443 passthrough risk, listener/route ownership, 64-listener budgeting, ListenerSet readiness), and alternatives considered.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~15 minutes

Possibly related issues

Poem

🐇 A proposal hops in, with a Gateway in hand,
SNI routes the data across the land.
One IP to rule them, each engine in line,
Postgres speaks direct-TLS — how divine!
Fail-closed and secure, the rabbit declares,
No private keys at the edge — nobody dares! 🔐

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the new design proposal for external database exposure via TLS passthrough, SNI routing, and end-to-end TLS.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch design/external-database-exposure

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a design proposal for exposing external databases via Gateway API TLS-passthrough (SNI) and end-to-end TLS to address IPv4 scarcity and provide managed TLS. The feedback highlights several key improvement opportunities: optimizing listener usage by defining one listener per engine type rather than per database instance to prevent hitting the 64-listener limit, replacing the vague API placeholder with a concrete *BackendObjectReference type, and using a validating admission webhook to handle conflicting TLS configurations instead of silently overriding them.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

CA -.->|"verifies"| C2
```

A Gateway listener is keyed by the tuple (port, protocol, hostname/SNI). SNI-based routing therefore works on any TCP port, not only 443 — a `TLSRoute` selects its backend by hostname regardless of the listener's port. "Share one IP via SNI" and "use the native port" are orthogonal, so the design takes both: one passthrough listener per engine on its native port — `tls-<release>` on 5432 for postgres, 6379 for redis, 27017 for sharded mongo, 3306 for mariadb — each `mode: Passthrough`, hostname `<release>.<apex>`, SNI-routed via a per-release `TLSRoute`. All of a tenant's database listeners live on the one tenant Gateway and therefore share its single IP.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Creating one passthrough listener per database instance (e.g., tls-<release> on port 5432) will quickly consume the 64-listener budget on the Gateway if a tenant deploys multiple instances of the same database type.\n\nInstead, you can define one listener per engine type/port (e.g., a single listener tls-postgres on port 5432 with a wildcard hostname like *.tenant.apex or no hostname restriction). Multiple TLSRoute resources (one per database release) can then attach to this single shared listener, using their respective spec.hostnames to handle SNI-based routing. This reduces the listener consumption from O(N) (where N is the number of database instances) to O(T) (where T is the number of unique database engine types, which is at most 4 or 5), effectively eliminating the 64-listener cap bottleneck for database scaling.

Name string `json:"name"` // listener suffix -> "tls-<name>"
Port int32 `json:"port"` // native port (5432/6379/27017/3306)
Hostname string `json:"hostname,omitempty"` // default "<name>.<apex>"
BackendRef ... `json:"backendRef"` // Service + port (cross-namespace via ReferenceGrant)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using ... as a placeholder in the API definition is vague. Specifying a concrete type like *BackendObjectReference (referencing the standard Gateway API type) makes the design proposal much clearer and directly implementable.

Suggested change
BackendRef ... `json:"backendRef"` // Service + port (cross-namespace via ReferenceGrant)
BackendRef *BackendObjectReference `json:"backendRef"` // Service + port (cross-namespace via ReferenceGrant)


- A pre-PG17 or non-direct-TLS Postgres client sends no SNI → no route → connection reset/timeout (document the symptom).
- The 64-listener budget is exceeded → the listener is rejected; mitigation is to split the high-fanout subtree onto its own Gateway.
- `tls.enabled=false` while exposing externally → certificate SAN mismatch; the tri-state should auto-enable TLS with `external`.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Silently overriding tls.enabled=false by auto-enabling TLS when external is true can lead to unexpected behaviors and security misunderstandings for users who explicitly disabled TLS.\n\nA more robust and predictable approach is to reject this invalid configuration using a validating admission webhook (e.g., enforcing that if external is true, tls.enabled must not be false).

Record the external-exposure design for managed databases: Gateway API
TLS-passthrough routed by SNI on native ports, collapsing N per-database
LoadBalancer IPs onto one tenant Gateway IP, with end-to-end TLS by
reusing the operator-issued certificate (passthrough does not terminate).

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
@lexfrei Aleksei Sviridkin (lexfrei) force-pushed the design/external-database-exposure branch from 1f2a675 to d237477 Compare June 24, 2026 11:22
@lexfrei Aleksei Sviridkin (lexfrei) marked this pull request as ready for review June 24, 2026 11:45
Replace the one-listener-per-database model with one shared passthrough
listener per engine type, with per-release routing on standard TLSRoute
objects keyed by SNI hostname. Listener consumption drops from O(database
instances) to O(engine types), so the 64-listener Gateway cap stops being
a per-database ceiling. Rename the API field to TLSPassthroughListeners,
declare the route backend as a standard Gateway API BackendObjectReference
instead of a placeholder, and rework the diagram, budget, failure,
testing, and open-questions sections to match.

Reject an explicit tls.enabled=false combined with external=true at
admission via a ValidatingAdmissionPolicy rather than silently overriding
it, so a tenant cannot stand up a plaintext external endpoint; an unset
tri-state still auto-enables TLS together with external.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
design-proposals/external-database-exposure/README.md (1)

157-174: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Define validation invariants for TLSPassthroughListeners in the proposal.

The new API shape is clear, but it doesn’t yet specify required validation rules (for example: name format, port range, uniqueness of name/port, and collision behavior with existing listeners). Adding these constraints in the design will avoid divergent controller/admission implementations.

Proposed doc patch
 type TLSPassthroughListener struct {
     Name     string `json:"name"`               // listener suffix -> "tls-<name>" (e.g. "postgres")
     Port     int32  `json:"port"`               // native port (5432/6379/27017/3306)
     Hostname string `json:"hostname,omitempty"` // wildcard SNI match, default "*.<apex>"
 }

 // TLSPassthroughListeners renders one Passthrough listener "tls-<name>" on
 // .Port per entry, AllowedRoutes restricted to TLSRoute. Independent of the
 // layer-7 TLSPassthroughServices field.
+// Validation contract (proposal-level):
+// - Name: DNS-1123 label; unique within the list.
+// - Port: 1..65535; unique within the list unless explicitly documented otherwise.
+// - Hostname: either omitted (defaults to "*.<apex>") or a valid DNS wildcard/exact hostname.
+// - Reject entries that would collide with existing synthesized listener names.
 // +optional
 TLSPassthroughListeners []TLSPassthroughListener `json:"tlsPassthroughListeners,omitempty"`
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@design-proposals/external-database-exposure/README.md` around lines 157 -
174, Add explicit validation invariants for TLSPassthroughListeners in the
proposal so controller and admission logic stay aligned. Define required rules
for TLSPassthroughListener.Name format, Port range and nonzero requirement,
Hostname shape/defaulting, and uniqueness/collision handling for name and port
against other listeners and existing TLS Passthrough services. Reference the new
TLSPassthroughListeners and TLSPassthroughListener types, and note how conflicts
should be rejected rather than inferred.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@design-proposals/external-database-exposure/README.md`:
- Around line 157-174: Add explicit validation invariants for
TLSPassthroughListeners in the proposal so controller and admission logic stay
aligned. Define required rules for TLSPassthroughListener.Name format, Port
range and nonzero requirement, Hostname shape/defaulting, and
uniqueness/collision handling for name and port against other listeners and
existing TLS Passthrough services. Reference the new TLSPassthroughListeners and
TLSPassthroughListener types, and note how conflicts should be rejected rather
than inferred.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4f613e51-0225-439e-9c2a-4a1b78e358f9

📥 Commits

Reviewing files that changed from the base of the PR and between fbfc6ba and 55aeda2.

📒 Files selected for processing (1)
  • design-proposals/external-database-exposure/README.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant