Skip to content

Harden AMD SEV-SNP KDS collateral fetch (async client, timeouts, caching) #746

Description

@h4x3rotab

Follow-up from #713. Not a trust-model gap — verification is fail-closed (KDS throttling denies release, never forges one) — but it's an availability foot-gun, especially now that --platform auto can land on SNP on AMD hosts.

What's wrong

sev-snp-qvl/src/lib.rs fetches AMD KDS collateral (cert chain + VCEK) with:

  • reqwest::blocking::Client::new() per request (lib.rs:374, lib.rs:395) — a fresh client every call, from inside an async verification path.
  • no request timeout — a hung or throttling KDS (HTTP 429 is documented on lab hosts) stalls verification with no bound.
  • no caching — every verification re-fetches the same per-product cert chain and per-(chip_id, TCB) VCEK.

What to do

  • use an async HTTP client (or run the blocking fetch on a dedicated pool), reusing one client.
  • set explicit connect + request timeouts.
  • cache collateral by (product, chip_id, reported_tcb); cert chains are per-product and long-lived, VCEKs are stable per (chip, TCB).
  • keep collateral validation fail-closed; the pinned ARK (builtin_ark()) stays the trust root regardless of what KDS returns.

The DSTACK_AMD_KDS_PROXY_URL / core.sev_snp.amd_kds_proxy_url mirror path already exists for throttled labs; this issue is about making the default path robust, not about the proxy.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions