Skip to content

fix(groomer): scale default timeoutMs with maxContextBytes to accommodate repo-context prompts#473

Merged
joryirving merged 1 commit into
mainfrom
fix/groomer-timeout-scale-with-context
Jun 28, 2026
Merged

fix(groomer): scale default timeoutMs with maxContextBytes to accommodate repo-context prompts#473
joryirving merged 1 commit into
mainfrom
fix/groomer-timeout-scale-with-context

Conversation

@itsmiso-ai

Copy link
Copy Markdown
Contributor

Bug

Flat 60s groomer timeout aborts repo-context-enabled runs because nvidia/local models take >60s on 4-8KB prompts. With DISPATCH_GROOMER_REPO_CONTEXT_ENABLED=true, live runs in v0.5.8 fail with "This operation was aborted" at ~62s mark.

Evidence

  • Live groomer runs with repo context enabled consistently hit the 60s wall
  • Nvidia model latency on 4-8KB prompts (issue + repo context) exceeds 60s
  • src/lib/groomer/config.ts:39 had hardcoded timeoutMs: 60000

Fix

Scaled default formula: `max(60000, min(300000, 60000 + ceil(maxContextBytes/1024)*5000))`

  • 60s floor — preserves current behavior for small prompts
  • 5s/KB — gives ~100s at 8KB (default), ~140s at 16KB; matches observed nvidia latency
  • 300s ceiling — prevents runaway timeouts on huge context budgets

`DISPATCH_GROOMER_TIMEOUT_MS` env var override preserved (explicit override always wins).

Verification

  • ✅ All 1801 tests pass (5 new timeout scaling tests added)
  • ✅ Lint clean
  • ✅ Typecheck clean

New test coverage

maxContextBytes Expected timeoutMs
8192 (default) 100,000
16384 140,000
2048 70,000
65536 300,000 (ceiling)

Env override test confirms explicit `DISPATCH_GROOMER_TIMEOUT_MS` wins over scaled default.

…date repo-context prompts

Flat 60s timeout aborts repo-context-enabled runs because nvidia/local
models take >60s on 4-8KB prompts.

Scaled default: max(60000, min(300000, 60000 + ceil(maxContextBytes/1024)*5000))
- 60s floor preserves current behavior for small prompts
- 5s/KB gives ~100s at 8KB, ~140s at 16KB
- 300s ceiling prevents runaway timeouts

DISPATCH_GROOMER_TIMEOUT_MS env override preserved.

@its-saffron its-saffron Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI Automated Review

Full PR review.

Analysis engine: MiniMax-M2.7@https://litellm.jory.dev/v1 (anthropic) — escalated (fast_low_confidence)

Recommendation

Approve. This PR correctly implements a scaled default timeout for the hosted groomer that dynamically adjusts based on maxContextBytes. The implementation is mathematically sound, the test coverage is thorough, and documentation is consistent across README and the hosted-groomer doc. All CI checks pass.


Change-by-Change Findings

src/lib/groomer/config.ts

  • Added computeDefaultTimeoutMs(maxContextBytes) function with the formula max(60000, min(300000, 60000 + ceil(maxContextBytes / 1024) * 5000)). Implementation is correct.
  • Refactored getHostedGroomerConfig into a two-phase construction: build the config object with timeoutMs: 0 as a placeholder, then resolve timeoutMs after maxContextBytes is known. This is a reasonable pattern given the dependency ordering.
  • Env override DISPATCH_GROOMER_TIMEOUT_MS still wins over the scaled default — behavior is preserved for operators who set an explicit timeout.
  • The disabled-path return still hardcodes timeoutMs: 60000, which is fine since the value is unused when the groomer is off.

src/lib/groomer/config.test.ts

  • Replaced the old "defaults to 60000" test with five targeted tests. Manual arithmetic verification confirms correctness:
    • 8192ceil(8)*5000 + 60000 = 100_000
    • 2048ceil(2)*5000 + 60000 = 70_000
    • 16384ceil(16)*5000 + 60000 = 140_000
    • 65536ceil(64)*5000 + 60000 = 380_000 → clamped to 300_000
    • Env override (45000) wins over scaled default (140000) ✓
  • Good boundary coverage including the floor, intermediate values, and ceiling.

README.md & docs/hosted-groomer.md

  • Both updated to reflect the new scaled default with the formula and clamp range. The README shows the full formula in the description column; hosted-groomer.md shows "Scaled" as the default value and spells out the formula in the description — consistent and clear.

Standards Compliance

  • Code standards (§4 Error handling, §5 Validation): Uses existing parseInt pattern consistently with the rest of the file. No new validation regressions introduced.
  • No agent-specific names in generic docs: Documentation uses generic descriptions with no specific agent names. ✓
  • No commit of secrets: No .env or credential files touched. ✓
  • Test parity: New behavior is covered by unit tests before merge; CI confirms all tests pass. ✓

Linked Issue Fit

No linked issue was provided in the corpus, so no acceptance criteria to cross-check. The PR's commit message and body clearly articulate the problem (flat 60s timeout aborting repo-context-enabled runs with nvidia/local models on 4–8KB prompts) and the solution (scaled formula with 60s floor, 5s/KB, 300s ceiling). The diff is tightly scoped to that goal.


Evidence Provider Findings

None provided in corpus.

Tool Harness Findings

None provided in corpus.


Unknowns / Needs Verification

  • No integration or e2e tests exercise the actual LLM call path with the new scaled timeout. Given the change is purely in config resolution (the timeout value flows through to existing HTTP callers unchanged), this is low risk. A follow-up smoke test against a real LLM endpoint with large DISPATCH_GROOMER_MAX_CONTEXT_BYTES would provide additional confidence but is not required for merge.

CI Verification

All CI checks passed: Docker Build, Build, Typecheck, Lint, and Tests — confirmed at commit a91fb6c.

@joryirving joryirving merged commit f727d0e into main Jun 28, 2026
6 checks passed
@joryirving joryirving deleted the fix/groomer-timeout-scale-with-context branch June 28, 2026 03:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants