Skip to content

harden: bound remaining catalog reads + lock in no-unbounded-read invariant#3145

Open
PascalThuet wants to merge 3 commits into
github:mainfrom
PascalThuet:split/catalog-bounded-reads
Open

harden: bound remaining catalog reads + lock in no-unbounded-read invariant#3145
PascalThuet wants to merge 3 commits into
github:mainfrom
PascalThuet:split/catalog-bounded-reads

Conversation

@PascalThuet

Copy link
Copy Markdown
Contributor

Part of splitting #2442 into smaller, dedicated PRs. Re-derived against current main. Final PR of the runtime-hardening stack.

Stacked on #3141 (which is itself stacked on #3140). Its diff includes those commits — review/merge #3140#3141 → this, in order.

What

  • Bound the remaining catalog JSON reads via read_response_limited(max_bytes=MAX_JSON_CATALOG_BYTES) and pass strict_redirects=True:
    • workflows/catalog.py, integrations/catalog.py, bundler/services/adapters.py, commands/bundle/__init__.py.
  • __init__.py workflow_add / workflow_step_add: bounded reads + strict_redirects=True + shared is_https_or_localhost_http URL validation.
  • extensions/_commands.py: bound the extension add --from <url> ZIP download — the last unbounded network read (surfaced by the gate below).

The invariant

Adds tests/test_download_security.py::test_remote_downloads_do_not_use_unbounded_response_reads: an AST scan over the whole src/specify_cli tree that fails CI if any response.read() is unbounded, with a 3-entry allowlist for the local-file get_hash reads (extensions/presets/integrations). This only goes green once #3140 + #3141 + this have converted every remote read — it's the lock that keeps the bound from silently regressing.

Validation

  • test_download_security (incl. the AST gate), test_workflows, test_integration_catalog, test_bundler_adapters, test_upgrade, test_extensions792 passed.
  • Bundle integration tests — 10 passed. ruff check clean.
  • Existing catalog/workflow tests pass with stream mocks made cursor-based (so they exercise the bounded reads) and strict_redirects threaded through the fake openers.

This completes the runtime hardening. The Bandit + detect-secrets CI gates land in a separate PR on the #2/#3 track.

Add a shared _download_security module (read_response_limited,
is_https_or_localhost_http, size constants) and route the GitHub release
and Azure DevOps token network reads through bounded reads so an oversized
response can't exhaust memory.

Add a strict_redirects mode to authentication.open_url: the redirect handler
now rejects any redirect whose target isn't HTTPS (or HTTP to localhost),
composing with the existing per-hop redirect_validator and auth-stripping.
The Azure DevOps token POST is routed through that handler so a 307/308
cannot forward the client_secret body to a non-HTTPS host.
…on/preset downloads

Extend _download_security with safe_extract_zip (path + symlink + per-member
and total size + entry-count bounds), read_zip_member_limited, and
verify_sha256, and adopt them across the extension and preset install paths:

- install_from_zip extracts via safe_extract_zip (zip-slip + symlink + zip-bomb
  defense) instead of the previous path-only containment check.
- catalog JSON fetches and package downloads use bounded reads; the inline
  manifest pre-read in extension update uses read_zip_member_limited.
- downloaded packages are verified against their catalog sha256 when present.
- CatalogStackBase and the preset catalog validator reject hostless URLs
  before the scheme check.

The command-registration path-traversal guard the original change carried is
already on main (github#3088) and is intentionally omitted.
…ad invariant

Route the remaining network JSON reads through read_response_limited and pass
strict_redirects=True on the workflow/integration/bundler catalog readers and
the workflow add / workflow step add download paths:

- workflows/catalog.py, integrations/catalog.py, bundler/services/adapters.py,
  commands/bundle/__init__.py: bounded catalog reads.
- __init__.py workflow_add / workflow_step_add: bounded reads + strict redirects
  + shared is_https_or_localhost_http validation.
- extensions/_commands.py: bound the `extension add --from <url>` ZIP download
  (the last unbounded network read).

Add test_remote_downloads_do_not_use_unbounded_response_reads: an AST gate that
scans the whole src/specify_cli tree and fails if any response.read() is
unbounded, except the three local-file get_hash reads. With this, every remote
read in the CLI is bounded.
@PascalThuet PascalThuet requested a review from mnriem as a code owner June 24, 2026 07:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant