Skip to content

design-proposal: unified TLS and PKI model#19

Open
Aleksei Sviridkin (lexfrei) wants to merge 2 commits into
mainfrom
design/unified-tls-pki
Open

design-proposal: unified TLS and PKI model#19
Aleksei Sviridkin (lexfrei) wants to merge 2 commits into
mainfrom
design/unified-tls-pki

Conversation

@lexfrei

@lexfrei Aleksei Sviridkin (lexfrei) commented Jun 24, 2026

Copy link
Copy Markdown

Adds a design proposal recording the target architecture for the TLS/PKI convergence tracked by epic cozystack/cozystack#2811.

The edge of that convergence has landed; the interior — who owns the PKI per managed engine and how a tenant receives the trust anchor — has not. This proposal pins it down before the per-app TLS pull requests land, and decides the two forks at its center on paper:

  • Issuance abstraction: a single operator-facing interface to choose the certificate source plus a uniform ca.crt consume contract, rather than "cert-manager as the single issuer".
  • Mint vs consume: an explicit two-tier model — cert-manager / wildcard at the edge, operator-owned PKI inside the engines — that unifies the consume contract, not the certificate authority. Pure-consume is rejected because it breaks CNPG/Strimzi rotation.

It also documents the verified delivery gap (pinned valuesFrom, removed lookup, async per-release CA) and proposes a watch-driven CA-distribution controller modelled on the existing wildcard-secret reconciler.

Status: Draft.

Summary by CodeRabbit

  • Documentation
    • Added a design proposal for a unified TLS/PKI approach for managed applications.
    • Described a consistent certificate consumption format for application tenants, including safer trust-anchor handling.
    • Documented expected behavior during certificate rotation, compatibility considerations, rollout guidance, and security boundaries.

@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@lexfrei, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 1 hour. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b22ef105-bb42-4833-8a9b-3c5856e325be

📥 Commits

Reviewing files that changed from the base of the PR and between f1966d9 and 20b7604.

📒 Files selected for processing (1)
  • design-proposals/unified-tls-pki/README.md
📝 Walkthrough

Walkthrough

A new design proposal document is added at design-proposals/unified-tls-pki/README.md. It defines a two-tier TLS/PKI model for Cozystack managed applications, specifying an edge tier for operator-provided ingress certificates and an interior tier with a uniform key-free <release>-ca-cert Secret contract, plus a CA-distribution controller for cert-manager-minting engines.

Changes

Unified TLS/PKI Design Proposal

Layer / File(s) Summary
Metadata, problem context, and goals
design-proposals/unified-tls-pki/README.md
Adds proposal header metadata and the motivation section describing the current edge/interior PKI split, missing interior ownership codification, and the explicit goals and non-goals.
Target architecture, contracts, and CA-distribution controller
design-proposals/unified-tls-pki/README.md
Defines the two-tier model, per-engine PKI-ownership contract, the key-free <release>-ca-cert Secret consume contract stamped with internal.cozystack.io/tenantresource: "true", the CA-distribution controller for cert-manager-minting engines, and per-engine convergence order.
User-facing outcomes, security boundary, rollout, open questions, and alternatives
design-proposals/unified-tls-pki/README.md
Documents user-facing effects, rollback expectations, security trust boundary, failure/edge cases, test strategy, rollout sequence, open questions on naming/namespace/controller scope, and alternatives considered.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related issues

Poem

🐇 Hop, hop, through the PKI garden I go,
Two tiers of trust, edge and interior aglow,
A ca-cert Secret, key-free and bright,
No private keys leaking into the night.
The controller watches, projects with care—
One fluffy design doc, beyond compare! 🌿

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the new design proposal and its unified TLS/PKI model.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch design/unified-tls-pki

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a design proposal for a unified TLS and PKI model for managed applications in Cozystack, outlining a two-tier architecture that separates edge-tier wildcard certificates from interior operator-owned PKI. Feedback on the proposal suggests programmatically sanitizing destination Secrets in the CA-distribution controller to explicitly omit private keys, and emitting Kubernetes Warning Events during naming collisions to prevent silent TLS verification failures.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.


For **operator-owned engines** (postgres, kafka, mongodb), the operator already publishes a CA-bearing Secret synchronously enough to consume, and for kafka it is already key-free. The work is to converge each on the canonical `<release>-ca-cert` shape and label — no new controller is required.

For **cert-manager-minting engines** (nats, qdrant, and the four open per-app pull requests), none of the chart-time paths work: `valuesFrom` is pinned, `lookup` is gone, and the CA is asynchronous (see "The problem"). The proposed mechanism is a small **watch-driven CA-distribution controller**, modelled directly on the existing wildcard-secret reconciler (`internal/controller/wildcardsecret/reconciler.go`, from `cozystack/cozystack#2990`). That reconciler already demonstrates the exact pattern: it watches a source Secret, projects copies into the right namespaces, marks every copy it owns with a management label (`cozystack.io/wildcard-secret-copy: "true"`), garbage-collects stale copies, and refuses to touch a foreign Secret that happens to share the name. The CA-distribution controller does the analogous thing one level down: it watches the per-release `<release>-ca` Secret that cert-manager creates asynchronously, and projects a `<release>-ca-cert` object carrying only `ca.crt` with the tenant-resource label. Because it is watch-driven, it sees the CA appear after render; because it writes a Secret the engine and the projection already understand, it needs neither `lookup` nor a custom `valuesFrom`.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

To ensure robust security, the CA-distribution controller must programmatically sanitize the destination Secret. It should explicitly copy only the ca.crt field and guarantee that any private keys (such as tls.key or ca.key) are omitted, rather than relying on external validation. This prevents accidental exposure of private keys through the label-filtered tenantsecrets projection.


- The helper's input PEM contains a private-key header → render fails closed; the chart does not deploy a key-bearing Secret.
- The per-release CA Secret does not exist yet → the controller waits for the watch event; it does not error or busy-loop.
- A foreign Secret already occupies the target name → the controller leaves it untouched (management-label guard).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Leaving a colliding foreign Secret untouched prevents overwriting, but it will cause silent TLS verification failures for the tenant because the correct CA certificate is never projected. The controller should emit a Kubernetes Warning Event on the corresponding application or namespace to alert operators of the naming collision.

Record a written target design for the TLS/PKI convergence: two-tier
(edge issuance + interior operator-owned PKI), a per-engine PKI-ownership
contract, and a delivery mechanism for the cert-manager-minting engines
where charts cannot wire the CA-only helper directly.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
@lexfrei Aleksei Sviridkin (lexfrei) marked this pull request as ready for review June 24, 2026 11:45

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
design-proposals/unified-tls-pki/README.md (1)

177-179: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Decide naming/scope before controller implementation starts.

The object naming/namespace and controller scope are still open questions, but rollout proceeds directly to implementation. Consider adding an explicit decision gate before Line 173 so follow-up PRs don’t diverge on contract shape.

Also applies to: 172-174

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@design-proposals/unified-tls-pki/README.md` around lines 177 - 179, Add an
explicit decision gate before implementation begins by clarifying the canonical
`<release>-ca-cert` naming/namespace contract and the controller scope in the
unified TLS PKI README, since these are still open. Update the nearby section
around the existing open questions to state that the CA-distribution controller
shape must be decided first, and tie it to the cert-manager-minting engines
versus a generalized projector, plus the per-tenant wildcard propagation
mechanism so future PRs follow one agreed contract.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@design-proposals/unified-tls-pki/README.md`:
- Line 20: The README has conflicting status for cozystack/cozystack#2990
between the “Edge” note and the later rollout summary, so update the wording to
use one consistent status or date-qualified phrasing. Fix the entries around the
Edge status and the later section that mentions `#2990` so they agree, using the
existing “Edge, open” / “done” labels and the issue reference to make the
rollout state unambiguous.

---

Nitpick comments:
In `@design-proposals/unified-tls-pki/README.md`:
- Around line 177-179: Add an explicit decision gate before implementation
begins by clarifying the canonical `<release>-ca-cert` naming/namespace contract
and the controller scope in the unified TLS PKI README, since these are still
open. Update the nearby section around the existing open questions to state that
the CA-distribution controller shape must be decided first, and tie it to the
cert-manager-minting engines versus a generalized projector, plus the per-tenant
wildcard propagation mechanism so future PRs follow one agreed contract.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d9ffcbdd-488c-485b-8826-c7cbb0c4a005

📥 Commits

Reviewing files that changed from the base of the PR and between fbfc6ba and f1966d9.

📒 Files selected for processing (1)
  • design-proposals/unified-tls-pki/README.md

Comment thread design-proposals/unified-tls-pki/README.md
@lllamnyp

Copy link
Copy Markdown
Member

Strong proposal — pinning the two-tier model and the per-engine PKI-ownership table on paper is exactly what was missing, and §"The problem" nails the real delivery gap (pinned valuesFrom, lookup invisible to the Flux digest, async per-release CA).

My one substantive concern is §5 (Delivery). The CA-distribution controller is justified by a <release>-ca<release>-ca-cert naming convention ("Because <release>-ca has a deterministic name, the controller can watch the source directly…"). That convention is the brittle part: it's an implicit contract the next application author silently violates by naming their CA secret anything else, and it re-implements machinery the platform already has (the §5 controller is specced to carry a management label, a foreign-collision guard, and garbage collection, all modelled on the wildcard reconciler). I think we can delete most of that controller by leaning on the existing lineage/projection path instead.

What's already in place (all in cozystack/cozystack)

The "which secrets are tenant-facing" contract is already declarative and is not name-based:

  • ApplicationDefinition.spec.secrets is an Include/Exclude list of ApplicationDefinitionResourceSelector (api/v1alpha1/applicationdefinitions_types.go), which matches by templated resourceNames and/or a label selector (matchLabels/matchExpressions).
  • The lineage webhook reconciles the trust-anchor label from that: computeLabelsmatchResourceToExcludeInclude sets TenantResourceLabelKey = "true"|"false" on every admission (internal/lineagecontrollerwebhook/webhook.go). It's authoritative — it stamps false when a secret stops matching, not just true when it does.
  • Crucially, it resolves a secret's owning application by walking ownerReferencesgetOwnerlineage.WalkOwnershipGraph, which reads GetOwnerReferences() (pkg/lineage/lineage.go). It does not key off identity labels on the secret.
  • pkg/registry/core/tenantsecret/rest.go then surfaces every tenantresource=true secret. The spec.secrets.include blocks in the cozyrds (e.g. packages/system/postgres-rd/cozyrds/postgres.yaml) are this contract in action today.

The consequence: if the projected ca.crt-only secret carries an ownerReference into the app's ownership graph, the owner reference does double duty — the webhook marks it tenant-facing automatically, and Kubernetes garbage-collects it on app deletion. That removes the controller's GC logic and its app-identity-labelling entirely.

The contract that's actually missing

Only the production of a key-free object is genuinely new (nothing in the platform strips a key today; the projection is whole-Data, so you can't just mark <release>-ca — it carries tls.key). I'd reframe §5 around three pieces, only the middle of which is new code:

  1. A source-selection label the chart stamps on its CA-bearing secret to opt into extraction — e.g. cozystack.io/publish-ca-cert: "true", plus an optional annotation for the key (default ca.crt). For the cert-manager-minting charts this rides Certificate.spec.secretTemplate.labels, so even the async-born <release>-ca carries the marker from creation. This replaces the name convention: the source opts in explicitly instead of the controller guessing by name.
  2. The controller: watch label-selected source secrets → upsert a type: Opaque, ca.crt-only secret (reuse the cozy-lib.tls.caCertSecret fail-closed guard) → set ownerReference and a tenant-CA marker label → re-copy on source change (rotation). No name convention, no prune logic, and collision-safety reduces to "only touch secrets I own."
  3. Marking stays in spec.secrets.include. Because the selector accepts a label selector, one generic entry — matchLabels: {cozystack.io/tenant-ca: "true"} — covers every engine with no per-release resourceName templating for the CA.

One caveat on the ownerRef target

Don't owner-ref the source <release>-ca secret: cert-manager does not own its output secrets by default, so that secret typically has no ownerRefs and the graph walk from it dead-ends before reaching the app. Point the projected secret's ownerRef at the application instance CR (one hop to the root; GC tied to the app), which the controller resolves from the app.kubernetes.io/instance label already on the source secret.

Net

The irreducible new work is the extraction step itself — read the CA secret, write the key-free copy, re-copy on rotation — which is the same job whether an operator does it natively (kafka's <release>-clients-ca-cert, the redis fork) or the controller does it for the cert-manager charts. Everything the proposal lists around it (naming, marking, GC, collision-safety) is already provided by lineage ownerRefs + spec.secrets + GC. Worth folding into §5 as the contract, so we ship one small extraction controller against an explicit source label rather than a replicator against a name convention.

(Separately, a more principled root fix worth a sentence in "Alternatives": give the projection/selector a field filter so a single ca.crt key can be projected out of a key-bearing secret without pre-materialising a stripped copy at all. Bigger change — the projection is whole-Data today — but it attacks the CA/key coupling of cozystack/cozystack#2814 directly and generalises to the operator-owned engines.)

One factual check for the table: it lists CNPG <release>-credentials as key-free, but cozystack/cozystack#2814 describes that same secret as carrying the server tls.key. The whole consume story for postgres turns on which is true — worth verifying the actual CNPG secret contents and pinning it in the table.

The PKI-ownership table claimed postgres already delivers a key-free
ca.crt. In fact CNPG's <release>-credentials Secret carries the server
tls.key and is the object surfaced to tenants, so postgres is the
canonical CA / private-key coupling case, not a converged engine. Fix
the table, the takeaway, and the interior-state text, and state that
convergence for postgres means a dedicated key-free <release>-ca-cert
plus removing <release>-credentials from the tenant-facing surface.

Replace the name-convention CA-distribution controller with an explicit
source-selection label plus a small extraction controller, reusing the
lineage webhook's spec.secrets label selector, owner-reference walk, and
authoritative tenantresource stamping for marking and garbage collection.
The controller sanitises to ca.crt at write time, owner-refs the
projected Secret to the application instance, and emits a Warning Event
on a name collision instead of failing silently.

Record the field-filter projection as the principled root fix, align the
per-tenant-propagation PR status across the document, and note the
clustersecret-operator overlap with cross-namespace wildcard propagation.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
@lexfrei

Copy link
Copy Markdown
Author

Adopted the whole §5 restructuring — this is the right call. The proposal now drops the name convention for an explicit internal.cozystack.io/publish-ca-cert source label (under the internal.cozystack.io/ prefix, to match the existing tenantresource/managed-by markers), and the controller shrinks to just the extraction step: it owner-refs the projected <release>-ca-cert to the application instance CR — not the source <release>-ca, precisely for the cert-manager-doesn't-own-its-output dead-end you flagged — and marking plus GC fall out of spec.secrets + lineage ownerRefs instead of being re-implemented. The generic matchLabels: {internal.cozystack.io/tenant-ca: "true"} entry replaces per-release resourceName templating. I also recorded the field-filter-on-the-projection idea in Alternatives as the principled root fix it is — it attacks the CA/key coupling at the source and generalises to the operator-owned engines, at a larger blast radius.

On the CNPG <release>-credentials check: you were right to pull that thread, and it's worse than a table typo. The Secret carries the server tls.key, and it's the object the postgres dashboard resource map already surfaces to tenants — so postgres isn't an already-clean consume engine, it's the live instance of the coupling. I reclassified it: only kafka's <release>-clients-ca-cert is the clean reference, and convergence for postgres now explicitly means a dedicated key-free <release>-ca-cert plus removing <release>-credentials from the tenant surface.

Two security details I tightened alongside: the extraction controller sanitises to ca.crt at write time (it runs post-render, so it can't lean on the helper's render-time guard), and a target-name collision emits a Warning Event rather than skipping silently — a silent skip only surfaces as an unexplained TLS-verify failure for the tenant.

@kvaps Andrei Kvapil (kvaps) left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Direction is right and the doc is well-grounded — these are non-blocking comments, no objection to the approach.

One factual correction — postgres is not the coupling case.

The doc makes postgres the canonical CA / private-key coupling instance ("carries the server tls.key", "the live instance of this risk") and sequences it first. On main that doesn't hold:

  • packages/apps/postgres/templates/init-script.yaml:18-27 renders <release>-credentials as an Opaque Secret whose stringData is only user: password pairs — no tls.key, no ca.crt.
  • packages/apps/postgres/templates/dashboard-resourcemap.yaml:16-22 surfaces exactly that passwords Secret to the tenant.
  • The "CNPG-created, carries ca.crt" claim traces to the comment at packages/apps/postgres/templates/db.yaml:20-22, which contradicts init-script.yaml: <release>-credentials is chart-owned, and CNPG's ca.crt lives in <release>-ca / -server / -app, none of which are in the dashboard Role.

So there's no private-key-to-tenant leak in any merged engine today. postgres is the same trust-anchor delivery gap as nats/qdrant (the tenant gets passwords, never ca.crt), not a coupling. Please fix the table row, the Security section, and the Upgrade step — removing <release>-credentials would drop the tenant's connection passwords, so convergence should add a key-free CA object, not remove that Secret. The db.yaml comment is wrong on its own; I'll fix it separately.

Kamaji belongs in the issuer enumeration, not just the exclusion row. It's the fourth independent PKI owner (control-plane CA for managed kubernetes), and it's the one that's fundamentally non-swappable — the kubeconfig pins that cluster CA, so a public edge cert is meaningless there. That's the strongest argument for "unify the interface, not the CA". Listing it only as "excluded" hides the point — I'd present it as a first-class example of the fork.

I'd prefer the field-filter over the extraction controller. The doc lists "a field filter on the projection itself" under Alternatives and defers it as touching the tenant-facing API. That's exactly where it belongs — the projection (pkg/registry/core/tenantsecret/rest.go, secretToTenant copies the whole Data) is the actual trust boundary. Extending ApplicationDefinition.spec.secrets.include with a per-key allow-list (e.g. keys: [ca.crt]) and applying it in the projection would:

  • remove the need for a new controller, a second <release>-ca-cert object, and the runtime no-private-key re-check;
  • work uniformly for operator-owned engines too (project ca.crt straight out of the CNPG secret — no async / pinned-valuesFrom / lookup problem);
  • keep the security boundary in one authoritative place instead of relying on every chart plus a runtime guard.

It's backward-compatible (no keys → today's whole-Data behavior). Cross-namespace wildcard distribution (#2820) stays separate — clustersecret-operator already covers that class.

The framing flip, the two-tier model, the whole-Data projection as the security pivot, and the kafka reference shape are all on point.

@kvaps Andrei Kvapil (kvaps) left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Direction is right and the doc is well-grounded — these are non-blocking comments, no objection to the approach.

One factual correction — postgres is not the coupling case.

The doc makes postgres the canonical CA / private-key coupling instance ("carries the server tls.key", "the live instance of this risk") and sequences it first. On main that doesn't hold:

  • packages/apps/postgres/templates/init-script.yaml:18-27 renders <release>-credentials as an Opaque Secret whose stringData is only user: password pairs — no tls.key, no ca.crt.
  • packages/apps/postgres/templates/dashboard-resourcemap.yaml:16-22 surfaces exactly that passwords Secret to the tenant.
  • The "CNPG-created, carries ca.crt" claim traces to the comment at packages/apps/postgres/templates/db.yaml:20-22, which contradicts init-script.yaml: <release>-credentials is chart-owned, and CNPG's ca.crt lives in <release>-ca / -server / -app, none of which are in the dashboard Role.

So there's no private-key-to-tenant leak in any merged engine today. postgres is the same trust-anchor delivery gap as nats/qdrant (the tenant gets passwords, never ca.crt), not a coupling. Please fix the table row, the Security section, and the Upgrade step — removing <release>-credentials would drop the tenant's connection passwords, so convergence should add a key-free CA object, not remove that Secret. The db.yaml comment is wrong on its own; I'll fix it separately.

Kamaji belongs in the issuer enumeration, not just the exclusion row. It's the fourth independent PKI owner (control-plane CA for managed kubernetes), and it's the one that's fundamentally non-swappable — the kubeconfig pins that cluster CA, so a public edge cert is meaningless there. That's the strongest argument for "unify the interface, not the CA". Listing it only as "excluded" hides the point — I'd present it as a first-class example of the fork.

I'd prefer the field-filter over the extraction controller. The doc lists "a field filter on the projection itself" under Alternatives and defers it as touching the tenant-facing API. That's exactly where it belongs — the projection (pkg/registry/core/tenantsecret/rest.go, secretToTenant copies the whole Data) is the actual trust boundary. Extending ApplicationDefinition.spec.secrets.include with a per-key allow-list (e.g. keys: [ca.crt]) and applying it in the projection would:

  • remove the need for a new controller, a second <release>-ca-cert object, and the runtime no-private-key re-check;
  • work uniformly for operator-owned engines too (project ca.crt straight out of the CNPG secret — no async / pinned-valuesFrom / lookup problem);
  • keep the security boundary in one authoritative place instead of relying on every chart plus a runtime guard.

It's backward-compatible (no keys → today's whole-Data behavior). Cross-namespace wildcard distribution (#2820) stays separate — clustersecret-operator already covers that class.

The framing flip, the two-tier model, the whole-Data projection as the security pivot, and the kafka reference shape are all on point.

@lllamnyp

Copy link
Copy Markdown
Member

On the field-filter: I think it's a good idea, and I'd rather not fold it into this proposal — not because it's wrong, but because it's a bigger and largely separate thing that happens to also solve the CA case.

The field-filter is a change to the tenant-secret projection contract itself, not to the TLS model. Today secretToTenant copies the whole Data of any labelled Secret. The projection is also writable at the registry level — it implements Create/Update/Patch/Delete, and the write path (tenantToSecret) replaces Data wholesale; tenant roles happen to grant only get/list/watch today, but that read-only posture lives in a different package (cozystack-basics clusterroles), not in the projection. So a per-key allow-list isn't a local change: it alters the read contract for every tenant secret, and its safety is entangled with a write path and an RBAC posture defined elsewhere — a write through a key-filtered view would replace Data with only the visible keys, silently dropping the ones the caller couldn't see, the moment any principal with write access uses it. Where the allow-list lives, who is authoritative for it, and how it composes with those write verbs are real questions, but they're a general decision about the tenant-facing API with nothing PKI-specific about them — and they deserve their own design with its own blast-radius analysis, not a rider on the TLS rollout.

The reason I don't want to couple them is concrete: the §5 mechanism already adopted here — the internal.cozystack.io/publish-ca-cert source label, an extraction step that writes a key-free object, and marking + GC falling out of spec.secrets + lineage ownerRefs — closes the TLS delivery gap without touching the projection contract at all. If we make this proposal depend on field-level projection, the per-app TLS series now blocks on a redesign of the tenant-secret API, which is a much larger and slower decision than the gap it's trying to close.

So my suggestion: spin field-level projection out as its own design proposal. It stands on its own merits — it generalises to any multi-key secret and would let operator-owned engines project ca.crt straight out of their existing credentials Secret — and it'll get a better treatment as a first-class proposal than as an implementation note here. We can reference it from this doc's Alternatives as a future simplification that could later retire the extraction step, explicitly not a dependency. That keeps the TLS proposal shippable on the mechanism we've already converged on, and gives the projection idea the room it actually needs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants