design-proposal: external database exposure via TLS-passthrough (SNI) and end-to-end TLS#20
design-proposal: external database exposure via TLS-passthrough (SNI) and end-to-end TLS#20Aleksei Sviridkin (lexfrei) wants to merge 2 commits into
Conversation
📝 WalkthroughWalkthroughAdds a new design proposal document ( ChangesExternal Database Exposure via Gateway API TLS-passthrough Design Proposal
Estimated code review effort🎯 2 (Simple) | ⏱️ ~15 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a design proposal for exposing external databases via Gateway API TLS-passthrough (SNI) and end-to-end TLS to address IPv4 scarcity and provide managed TLS. The feedback highlights several key improvement opportunities: optimizing listener usage by defining one listener per engine type rather than per database instance to prevent hitting the 64-listener limit, replacing the vague API placeholder with a concrete *BackendObjectReference type, and using a validating admission webhook to handle conflicting TLS configurations instead of silently overriding them.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| CA -.->|"verifies"| C2 | ||
| ``` | ||
|
|
||
| A Gateway listener is keyed by the tuple (port, protocol, hostname/SNI). SNI-based routing therefore works on any TCP port, not only 443 — a `TLSRoute` selects its backend by hostname regardless of the listener's port. "Share one IP via SNI" and "use the native port" are orthogonal, so the design takes both: one passthrough listener per engine on its native port — `tls-<release>` on 5432 for postgres, 6379 for redis, 27017 for sharded mongo, 3306 for mariadb — each `mode: Passthrough`, hostname `<release>.<apex>`, SNI-routed via a per-release `TLSRoute`. All of a tenant's database listeners live on the one tenant Gateway and therefore share its single IP. |
There was a problem hiding this comment.
Creating one passthrough listener per database instance (e.g., tls-<release> on port 5432) will quickly consume the 64-listener budget on the Gateway if a tenant deploys multiple instances of the same database type.\n\nInstead, you can define one listener per engine type/port (e.g., a single listener tls-postgres on port 5432 with a wildcard hostname like *.tenant.apex or no hostname restriction). Multiple TLSRoute resources (one per database release) can then attach to this single shared listener, using their respective spec.hostnames to handle SNI-based routing. This reduces the listener consumption from O(N) (where N is the number of database instances) to O(T) (where T is the number of unique database engine types, which is at most 4 or 5), effectively eliminating the 64-listener cap bottleneck for database scaling.
| Name string `json:"name"` // listener suffix -> "tls-<name>" | ||
| Port int32 `json:"port"` // native port (5432/6379/27017/3306) | ||
| Hostname string `json:"hostname,omitempty"` // default "<name>.<apex>" | ||
| BackendRef ... `json:"backendRef"` // Service + port (cross-namespace via ReferenceGrant) |
There was a problem hiding this comment.
Using ... as a placeholder in the API definition is vague. Specifying a concrete type like *BackendObjectReference (referencing the standard Gateway API type) makes the design proposal much clearer and directly implementable.
| BackendRef ... `json:"backendRef"` // Service + port (cross-namespace via ReferenceGrant) | |
| BackendRef *BackendObjectReference `json:"backendRef"` // Service + port (cross-namespace via ReferenceGrant) |
|
|
||
| - A pre-PG17 or non-direct-TLS Postgres client sends no SNI → no route → connection reset/timeout (document the symptom). | ||
| - The 64-listener budget is exceeded → the listener is rejected; mitigation is to split the high-fanout subtree onto its own Gateway. | ||
| - `tls.enabled=false` while exposing externally → certificate SAN mismatch; the tri-state should auto-enable TLS with `external`. |
There was a problem hiding this comment.
Silently overriding tls.enabled=false by auto-enabling TLS when external is true can lead to unexpected behaviors and security misunderstandings for users who explicitly disabled TLS.\n\nA more robust and predictable approach is to reject this invalid configuration using a validating admission webhook (e.g., enforcing that if external is true, tls.enabled must not be false).
Record the external-exposure design for managed databases: Gateway API TLS-passthrough routed by SNI on native ports, collapsing N per-database LoadBalancer IPs onto one tenant Gateway IP, with end-to-end TLS by reusing the operator-issued certificate (passthrough does not terminate). Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
1f2a675 to
d237477
Compare
Replace the one-listener-per-database model with one shared passthrough listener per engine type, with per-release routing on standard TLSRoute objects keyed by SNI hostname. Listener consumption drops from O(database instances) to O(engine types), so the 64-listener Gateway cap stops being a per-database ceiling. Rename the API field to TLSPassthroughListeners, declare the route backend as a standard Gateway API BackendObjectReference instead of a placeholder, and rework the diagram, budget, failure, testing, and open-questions sections to match. Reject an explicit tls.enabled=false combined with external=true at admission via a ValidatingAdmissionPolicy rather than silently overriding it, so a tenant cannot stand up a plaintext external endpoint; an unset tri-state still auto-enables TLS together with external. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
design-proposals/external-database-exposure/README.md (1)
157-174: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winDefine validation invariants for
TLSPassthroughListenersin the proposal.The new API shape is clear, but it doesn’t yet specify required validation rules (for example:
nameformat,portrange, uniqueness ofname/port, and collision behavior with existing listeners). Adding these constraints in the design will avoid divergent controller/admission implementations.Proposed doc patch
type TLSPassthroughListener struct { Name string `json:"name"` // listener suffix -> "tls-<name>" (e.g. "postgres") Port int32 `json:"port"` // native port (5432/6379/27017/3306) Hostname string `json:"hostname,omitempty"` // wildcard SNI match, default "*.<apex>" } // TLSPassthroughListeners renders one Passthrough listener "tls-<name>" on // .Port per entry, AllowedRoutes restricted to TLSRoute. Independent of the // layer-7 TLSPassthroughServices field. +// Validation contract (proposal-level): +// - Name: DNS-1123 label; unique within the list. +// - Port: 1..65535; unique within the list unless explicitly documented otherwise. +// - Hostname: either omitted (defaults to "*.<apex>") or a valid DNS wildcard/exact hostname. +// - Reject entries that would collide with existing synthesized listener names. // +optional TLSPassthroughListeners []TLSPassthroughListener `json:"tlsPassthroughListeners,omitempty"`🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@design-proposals/external-database-exposure/README.md` around lines 157 - 174, Add explicit validation invariants for TLSPassthroughListeners in the proposal so controller and admission logic stay aligned. Define required rules for TLSPassthroughListener.Name format, Port range and nonzero requirement, Hostname shape/defaulting, and uniqueness/collision handling for name and port against other listeners and existing TLS Passthrough services. Reference the new TLSPassthroughListeners and TLSPassthroughListener types, and note how conflicts should be rejected rather than inferred.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@design-proposals/external-database-exposure/README.md`:
- Around line 157-174: Add explicit validation invariants for
TLSPassthroughListeners in the proposal so controller and admission logic stay
aligned. Define required rules for TLSPassthroughListener.Name format, Port
range and nonzero requirement, Hostname shape/defaulting, and
uniqueness/collision handling for name and port against other listeners and
existing TLS Passthrough services. Reference the new TLSPassthroughListeners and
TLSPassthroughListener types, and note how conflicts should be rejected rather
than inferred.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4f613e51-0225-439e-9c2a-4a1b78e358f9
📒 Files selected for processing (1)
design-proposals/external-database-exposure/README.md
Adds the external-exposure design proposal for managed databases — the WS4/WS5 half of epic cozystack/cozystack#2811. The certificate/PKI half is the sibling proposal design-proposals/unified-tls-pki.
Databases are exposed through the Gateway API TLS-passthrough listeners Cozystack already operates, routed by SNI on each engine's native port. Two outcomes:
Includes a per-engine matrix (Redis fits cleanest; Postgres needs direct-TLS; sharded Mongo and MariaDB opt-in; Kafka and non-sharded Mongo deferred for per-broker/per-member topology) and a minimal API extension (TLSPassthroughBackends on TenantGateway).
Status: Draft.
Summary by CodeRabbit