Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 15 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@

Original Contributors: Hang Yin, Kevin Wang, Andrew Miller

[Documentation](https://docs.phala.com/dstack) · [Examples](https://github.com/Dstack-TEE/dstack-examples) · [Community](https://t.me/+UO4bS4jflr45YmUx)
[Documentation](https://docs.phala.com/dstack) · [Security](./SECURITY.md) · [Examples](https://github.com/Dstack-TEE/dstack-examples) · [Community](https://t.me/+UO4bS4jflr45YmUx)

</div>

Expand Down Expand Up @@ -89,6 +89,19 @@ Your container runs inside a Confidential VM, such as Intel TDX or AMD SEV-SNP,

[Full security model →](./docs/security/security-model.md)

## Security and Trust

Security docs are linked here so deployers and reviewers can quickly find the trust model, production guidance, audit, and the status of already-answered public findings.

- [Security Overview](./docs/security/) - entry point for users, operators, researchers, and AI agents
- [Security Model](./docs/security/security-model.md) - threat model, trust boundaries, and verification checklist
- [Security Issue Triage](./docs/security/security-issue-triage.md) - public status for answered, fixed, accepted, and roadmap security reports
- [Security Best Practices](./docs/security/security-best-practices.md) - production settings and hardening guidance
- [Security Audit](./docs/security/dstack-audit.pdf) - third-party audit by zkSecurity
- [Report a Vulnerability](./SECURITY.md) - use GitHub's private security reporting path

Please do not disclose exploitable vulnerabilities in public GitHub issues. Use the private reporting path in [SECURITY.md](./SECURITY.md).

## SDKs

Apps communicate with the guest agent via HTTP over `/var/run/dstack.sock`. Use the [HTTP API](./sdk/curl/api.md) directly with curl, or use a language SDK:
Expand Down Expand Up @@ -121,14 +134,6 @@ Apps communicate with the guest agent via HTTP over `/var/run/dstack.sock`. Use
- [Design Decisions](./docs/design-and-hardening-decisions.md) - Architecture rationale
- [FAQ](./docs/faq.md) - Frequently asked questions

## Security

- [Security Overview](./docs/security/) - Security documentation and responsible disclosure
- [Security Model](./docs/security/security-model.md) - Threat model and trust boundaries
- [Security Best Practices](./docs/security/security-best-practices.md) - Production hardening
- [Security Audit](./docs/security/dstack-audit.pdf) - Third-party audit by zkSecurity
- [CVM Boundaries](./docs/security/cvm-boundaries.md) - Information exchange and isolation

## FAQ

<details>
Expand Down Expand Up @@ -180,7 +185,7 @@ Yes. dstack runs on supported TEE-capable servers, including Intel TDX-capable h

- **GCP**: Intel TDX (Confidential VMs)
- **AWS**: Nitro Enclaves (NSM attestation)
- **Bare metal**: Intel TDX (4th/5th Gen Xeon) and AMD SEV-SNP on supported dstack OS images
- **Bare metal**: Intel TDX (4th/5th Gen Xeon) and AMD SEV-SNP on supported dstack OS images. Intel TDX is the production path; AMD SEV-SNP is new and experimental.
- **GPUs**: NVIDIA Confidential Computing (H100, Blackwell)

</details>
Expand Down Expand Up @@ -227,5 +232,3 @@ Logo and branding assets: [dstack-logo-kit](./docs/assets/dstack-logo-kit/)
## License

Apache 2.0
</content>
</invoke>
21 changes: 21 additions & 0 deletions SECURITY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Security

Use this file for vulnerability reports. For the security model, production guidance, audit, and already-answered public findings, start with [Security Documentation](./docs/security/).

## Report a vulnerability

If you believe you found a vulnerability, please use GitHub's private security reporting features for this repository. If GitHub private reporting is unavailable, contact security@phala.network.

Do not open public GitHub issues for exploitable vulnerabilities or details that could help exploit production deployments.

Use private reporting for issues that could expose secrets, bypass attestation or authorization, compromise KMS keys, weaken workload isolation, or enable unauthorized code or configuration changes in production deployments.

## Public security questions

Use public issues only for questions about documented behavior, documentation gaps, already-public findings, or hardening ideas that do not include an exploit path.

Before opening a public security question, check [Security Issue Triage](./docs/security/security-issue-triage.md). It records public findings that were fixed, accepted by design, documented, or moved to roadmap work.

## Production trust boundary

Development settings are not production-safe merely because they are present in the codebase. Production deployments must rely on measured configuration, expected TEE measurements, authorization policy, and attestation verification. The documented security model is the source of truth for what dstack treats as a production guarantee.
18 changes: 16 additions & 2 deletions docs/security/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,13 @@

dstack security resources for auditors, researchers, and operators.

## Start Here

- **Users and verifiers:** read the [Security Model](./security-model.md) to understand what dstack guarantees and what you must verify.
- **Operators:** read [Security Best Practices](./security-best-practices.md) before deploying production KMS, gateway, or VMM services.
- **Security researchers and AI agents:** report exploitable vulnerabilities through the private path in [SECURITY.md](../../SECURITY.md). For already-public findings or docs questions, check [Security Issue Triage](./security-issue-triage.md) before opening a public issue.
- **Maintainers:** use [Security Issue Triage](./security-issue-triage.md) to classify public reports and close issues once the maintainer position is clear.

## Audit

dstack has been audited by zkSecurity. See the [full audit report](./dstack-audit.pdf).
Expand All @@ -10,8 +17,15 @@ dstack has been audited by zkSecurity. See the [full audit report](./dstack-audi

- [Security Model](./security-model.md) - Threat model, trust boundaries, and verification checklist
- [Security Best Practices](./security-best-practices.md) - Production hardening guide
- [Security Issue Triage](./security-issue-triage.md) - Public status for answered, fixed, accepted, and roadmap reports
- [CVM Boundaries](./cvm-boundaries.md) - Information exchange and isolation details

## Responsible Disclosure
## Already Answered Reports

Some public security reports describe real hardening work. Some describe behavior that is intentional for development or compatibility, and some are false positives under production configuration. The canonical list is [Security Issue Triage](./security-issue-triage.md). Search that page by issue number, component, or exact setting name before treating an old report as unresolved.

## Report Vulnerabilities

If you believe you found an exploitable vulnerability, use GitHub's private security reporting features as described in [SECURITY.md](../../SECURITY.md). If GitHub private reporting is unavailable, contact security@phala.network.

To report a security vulnerability, email security@phala.network. We will respond within 48 hours.
Do not open GitHub issues for exploitable vulnerabilities.
15 changes: 15 additions & 0 deletions docs/security/security-best-practices.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,21 @@ Example app-compose.json:

**But keep in mind, even if you disable exposing app-compose.json, it is just hidden from the public API, the physical machine controller can still access it on the file system.**

## Do not use development trust settings in production

Development settings are intentionally easy to audit, but they are not production-safe. A production deployment should satisfy all of the following:

- KMS quote verification remains enabled. Do not deploy production KMS with `quote_enabled = false`.
- KMS authorization uses webhook/on-chain policy. Do not use `auth_api.type = "dev"` with real key material.
- The KMS contract pins a concrete gateway app id. Do not use `gateway_app_id = "any"` for production traffic.
- TEE quotes are evaluated by deployment policy, including TCB status and expected OS/application measurements.

The KMS TLS listener may keep `rpc.tls.mutual.mandatory = false` because bootstrap endpoints need to be reachable before a client has an RA-TLS certificate. Sensitive KMS routes still require the client certificate and attestation evidence in application code before releasing keys or signing certificates.

## Keep private material owner-only

Secret-bearing files should be owner-only (`0600`) wherever possible, including app keys, decrypted env files, KMS root keys, gateway WireGuard/TLS keys, and ACME credentials. Preserve restrictive permissions when copying volumes, backing up `/etc/kms/certs`, or moving gateway and certbot state between hosts. Public issue [#606](https://github.com/Dstack-TEE/dstack/issues/606) tracks the remaining low-cost hardening work in dstack-managed file writes.

## docker logs is public available by default

Similarly, to facilitate App observability, docker logs are public by default. You can disable exposing docker logs by setting public_logs=false.
Expand Down
53 changes: 53 additions & 0 deletions docs/security/security-issue-triage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Security Issue Triage

Security issues should not remain open after the maintainer position is clear. An open issue means one of two things: a fix is still required, or a concrete design/roadmap item is intentionally being tracked. Everything else should be closed with a final maintainer comment and a link to the code or documentation that records the decision.

This page is not a vulnerability reporting channel. Report exploitable vulnerabilities privately through [SECURITY.md](../../SECURITY.md). Use public issues only for questions, documentation gaps, duplicate-prone prior findings, or hardening ideas that do not disclose an exploit path.

## Triage labels

Use these categories when evaluating public security questions and already-public reports:

| Category | Meaning | Expected issue state |
| --- | --- | --- |
| Real blocker | Confirmed vulnerability that can compromise production security under supported configuration | Keep open until fixed; close as completed when the fix lands |
| Needs hardening | Not a broken trust boundary, but a defense-in-depth improvement with no compatibility cost | Keep open only while the patch is pending; close as completed when merged |
| Fixed | The reported behavior has already been fixed or is fixed by the linked change | Close as completed |
| Docs-only | The behavior is intentional or lower severity, but the repo must say so clearly | Close after documentation is merged |
| Accepted by design | The report conflicts with the documented threat model or with an intentional compatibility constraint | Close as not planned, with the design rationale linked |

When a report mixes several claims, split the actionable work into separate issues before closing the original. Do not leave a broad "security" issue open just to remember future work.

## March 2026 security cluster

The March cluster contained a mix of real hardening, compatibility decisions, and false positives. The current repo position is:

| Issue | Classification | Maintainer action |
| --- | --- | --- |
| [#606](https://github.com/Dstack-TEE/dstack/issues/606) App keys and decrypted env files world-readable | Needs hardening | Tightening secret-bearing file writes to owner-only permissions (`0600`) is a valid defense-in-depth improvement with no expected compatibility cost |
| [#605](https://github.com/Dstack-TEE/dstack/issues/605) Identical raw key material across `ed25519` and `secp256k1` for the same path | Accepted compatibility decision, docs-only | Existing derived key bytes are preserved; docs now state that `path` is the domain separator and callers must use algorithm-specific paths when they require independent keys |
| [#607](https://github.com/Dstack-TEE/dstack/issues/607) `gateway_app_id = "any"` disables gateway identity pinning | Accepted by design for dev/test deployments | `gateway_app_id` is KMS contract configuration and is publicly auditable; production deployments must not use `"any"` |
| [#608](https://github.com/Dstack-TEE/dstack/issues/608) `auth_api.type = "dev"` allows all authorization | Accepted by design for local/integration testing | Dev auth is measured runtime configuration, not a production mode; production must use webhook/on-chain authorization |
| [#609](https://github.com/Dstack-TEE/dstack/issues/609) `quote_enabled = false` bypasses attestation | Accepted by design for local development | The flag is measured in runtime configuration and should fail production attestation policy |
| [#561](https://github.com/Dstack-TEE/dstack/issues/561) KMS TLS client certificates are non-mandatory in Rocket config | Docs-only for current architecture | The TLS listener allows unauthenticated bootstrap endpoints, while sensitive KMS handlers enforce client certificate and attestation checks in application code |
| [#552](https://github.com/Dstack-TEE/dstack/issues/552) Static HKDF salt and no key versioning | Design roadmap, not a near-term vulnerability | Static salt is acceptable with high-entropy KMS root material and explicit context; key versioning/rotation requires a broader compatibility design |

Recommended GitHub cleanup for this cluster:

- Keep #606 open until the `0600` hardening change lands, then close it as completed.
- Close #605, #561, #607, #608, and #609 with links to the relevant security docs and maintainer rationale.
- Keep a separate roadmap issue for KMS key versioning/rotation if it has an owner and migration plan; otherwise close #552 as not planned for the current KDF version.

## Search terms for duplicate-prone findings

Researchers and AI agents should search this page and linked issues before treating these as new vulnerabilities:

- `quote_enabled = false`
- `auth_api.type = "dev"`
- `gateway_app_id = "any"`
- `rpc.tls.mutual.mandatory = false`
- `get_temp_ca_cert`
- `ed25519` and `secp256k1` with the same derivation path
- `RATLS` HKDF salt
- KMS key versioning and rotation
- app keys and decrypted env file permissions
16 changes: 15 additions & 1 deletion docs/security/security-model.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ This document helps you evaluate whether dstack's security model fits your needs

dstack removes the need to trust infrastructure operators. The cloud provider cannot read your memory, modify your code, or access your secrets. Network attackers cannot intercept your traffic because TLS terminates inside the TEE with keys fully controlled by the TEE (Zero Trust HTTPS). Docker registries cannot serve malicious images because the TEE verifies SHA256 digests before pulling.

The only thing you must trust is **TEE hardware** (currently Intel TDX, with AMD SEV support planned). You trust that the TEE provides genuine memory encryption and that quotes are signed by real hardware. For GPU workloads, you also trust **NVIDIA GPU hardware** and NVIDIA's Remote Attestation Service (NRAS). These are hardware-level trust assumptions.
The only thing you must trust is **TEE hardware**. Intel TDX is the production path. AMD SEV-SNP is available where the selected dstack OS image and host support it, but it is new and experimental. You trust that the TEE provides genuine memory encryption and that quotes are signed by real hardware. For GPU workloads, you also trust **NVIDIA GPU hardware** and NVIDIA's Remote Attestation Service (NRAS). These are hardware-level trust assumptions.

Everything else is verifiable.

Expand Down Expand Up @@ -134,6 +134,20 @@ The one case dstack does not leave to downstream is a genuinely invalid TCB: `dc

> **Future work:** this will be refactored toward a grace-period model, where an out-of-date TCB is accepted for a bounded window after a new TCB level is published rather than being a binary downstream decision.

### Development modes are auditable, not production-safe

dstack keeps several development switches as runtime or on-chain configuration rather than Cargo feature flags. Examples include KMS `quote_enabled = false`, `auth_api.type = "dev"`, and KMS contract `gateway_app_id = "any"`. These settings exist for local development and integration tests, not for production deployments.

This is intentional. Runtime configuration that affects the trust boundary is visible in attestation measurements or public contract state. Cargo feature gates are not automatically more auditable because feature unification can enable a feature through a dependency graph, and the resulting runtime behavior is not represented as a measured deployment setting.

Production verifiers should reject deployments that use these development settings. Operators should treat them the same way they treat debug-mode TEE quotes: useful for testing, invalid for production trust.

### KMS mTLS is route-enforced for sensitive operations

The KMS Rocket TLS listener permits connections without a client certificate because some bootstrap and public metadata endpoints must be reachable before a client has an RA-TLS certificate. That listener setting is not the authorization boundary for key material.

Sensitive KMS handlers enforce their own boundary: callers must present the expected client certificate and attestation evidence before key derivation, KMS key replication, or certificate signing succeeds. Public endpoints are limited to bootstrap, metadata, health, and metrics behavior documented for operators.

## Limitations

### Attestation proves identity, not correctness
Expand Down
13 changes: 10 additions & 3 deletions docs/tutorials/kms-build-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,9 @@ certs = "/etc/kms/certs/rpc.crt"
# Mutual TLS (mTLS) Configuration
[rpc.tls.mutual]
ca_certs = "/etc/kms/certs/tmp-ca.crt"
# Keep the TLS listener optional because bootstrap/public endpoints must be
# reachable before a client has an RA-TLS certificate. Sensitive KMS RPCs still
# enforce client certificate and attestation checks in their handlers.
mandatory = false

# Core KMS Configuration
Expand Down Expand Up @@ -221,7 +224,7 @@ EOF
| `[rpc]` | `address` | RPC server bind address |
| `[rpc]` | `port` | RPC server port (9100) |
| `[core]` | `cert_dir` | Directory for certificates |
| `[core]` | `pccs_url` | Local PCCS via host bridge (`10.0.2.2`) for quote verification |
| `[core]` | `pccs_url` | PCCS endpoint for quote verification |
| `[core.auth_api]` | `url` | Auth-eth webhook service URL |
| `[core.onboard]` | `enabled` | Enable bootstrap/onboard mode |

Expand Down Expand Up @@ -321,7 +324,8 @@ chmod 600 /etc/kms/auth-eth.env
### Verify configuration

```bash
cat /etc/kms/auth-eth.env
grep -E '^(HOST|PORT|KMS_CONTRACT_ADDR)=' /etc/kms/auth-eth.env
grep -q '^ETH_RPC_URL=.' /etc/kms/auth-eth.env && echo "ETH_RPC_URL is set"
```

## Step 7: Create Docker Image for CVM Deployment
Expand Down Expand Up @@ -465,6 +469,9 @@ certs = "/etc/kms/certs/rpc.crt"
# Mutual TLS (mTLS) Configuration
[rpc.tls.mutual]
ca_certs = "/etc/kms/certs/tmp-ca.crt"
# Keep the TLS listener optional because bootstrap/public endpoints must be
# reachable before a client has an RA-TLS certificate. Sensitive KMS RPCs still
# enforce client certificate and attestation checks in their handlers.
mandatory = false

# Core KMS Configuration
Expand Down Expand Up @@ -621,7 +628,7 @@ cat /etc/kms/kms.toml | python3 -c "import sys, tomllib; tomllib.load(sys.stdin.
```bash
# Source and verify environment
source /etc/kms/auth-eth.env
echo "ETH_RPC_URL: ${ETH_RPC_URL:0:30}..."
test -n "$ETH_RPC_URL" && echo "ETH_RPC_URL is set"
echo "KMS_CONTRACT_ADDR: $KMS_CONTRACT_ADDR"
```

Expand Down
Loading
Loading