fix(groomer): scale default timeoutMs with maxContextBytes to accommodate repo-context prompts#473
Conversation
…date repo-context prompts Flat 60s timeout aborts repo-context-enabled runs because nvidia/local models take >60s on 4-8KB prompts. Scaled default: max(60000, min(300000, 60000 + ceil(maxContextBytes/1024)*5000)) - 60s floor preserves current behavior for small prompts - 5s/KB gives ~100s at 8KB, ~140s at 16KB - 300s ceiling prevents runaway timeouts DISPATCH_GROOMER_TIMEOUT_MS env override preserved.
There was a problem hiding this comment.
AI Automated Review
Full PR review.
Analysis engine: MiniMax-M2.7@https://litellm.jory.dev/v1 (anthropic) — escalated (fast_low_confidence)
Recommendation
Approve. This PR correctly implements a scaled default timeout for the hosted groomer that dynamically adjusts based on maxContextBytes. The implementation is mathematically sound, the test coverage is thorough, and documentation is consistent across README and the hosted-groomer doc. All CI checks pass.
Change-by-Change Findings
src/lib/groomer/config.ts
- Added
computeDefaultTimeoutMs(maxContextBytes)function with the formulamax(60000, min(300000, 60000 + ceil(maxContextBytes / 1024) * 5000)). Implementation is correct. - Refactored
getHostedGroomerConfiginto a two-phase construction: build the config object withtimeoutMs: 0as a placeholder, then resolvetimeoutMsaftermaxContextBytesis known. This is a reasonable pattern given the dependency ordering. - Env override
DISPATCH_GROOMER_TIMEOUT_MSstill wins over the scaled default — behavior is preserved for operators who set an explicit timeout. - The disabled-path return still hardcodes
timeoutMs: 60000, which is fine since the value is unused when the groomer is off.
src/lib/groomer/config.test.ts
- Replaced the old "defaults to 60000" test with five targeted tests. Manual arithmetic verification confirms correctness:
8192→ceil(8)*5000 + 60000 = 100_000✓2048→ceil(2)*5000 + 60000 = 70_000✓16384→ceil(16)*5000 + 60000 = 140_000✓65536→ceil(64)*5000 + 60000 = 380_000→ clamped to300_000✓- Env override (
45000) wins over scaled default (140000) ✓
- Good boundary coverage including the floor, intermediate values, and ceiling.
README.md & docs/hosted-groomer.md
- Both updated to reflect the new scaled default with the formula and clamp range. The README shows the full formula in the description column;
hosted-groomer.mdshows "Scaled" as the default value and spells out the formula in the description — consistent and clear.
Standards Compliance
- Code standards (§4 Error handling, §5 Validation): Uses existing
parseIntpattern consistently with the rest of the file. No new validation regressions introduced. - No agent-specific names in generic docs: Documentation uses generic descriptions with no specific agent names. ✓
- No commit of secrets: No
.envor credential files touched. ✓ - Test parity: New behavior is covered by unit tests before merge; CI confirms all tests pass. ✓
Linked Issue Fit
No linked issue was provided in the corpus, so no acceptance criteria to cross-check. The PR's commit message and body clearly articulate the problem (flat 60s timeout aborting repo-context-enabled runs with nvidia/local models on 4–8KB prompts) and the solution (scaled formula with 60s floor, 5s/KB, 300s ceiling). The diff is tightly scoped to that goal.
Evidence Provider Findings
None provided in corpus.
Tool Harness Findings
None provided in corpus.
Unknowns / Needs Verification
- No integration or e2e tests exercise the actual LLM call path with the new scaled timeout. Given the change is purely in config resolution (the timeout value flows through to existing HTTP callers unchanged), this is low risk. A follow-up smoke test against a real LLM endpoint with large
DISPATCH_GROOMER_MAX_CONTEXT_BYTESwould provide additional confidence but is not required for merge.
CI Verification
All CI checks passed: Docker Build, Build, Typecheck, Lint, and Tests — confirmed at commit a91fb6c.
Bug
Flat 60s groomer timeout aborts repo-context-enabled runs because nvidia/local models take >60s on 4-8KB prompts. With
DISPATCH_GROOMER_REPO_CONTEXT_ENABLED=true, live runs in v0.5.8 fail with "This operation was aborted" at ~62s mark.Evidence
src/lib/groomer/config.ts:39had hardcodedtimeoutMs: 60000Fix
Scaled default formula: `max(60000, min(300000, 60000 + ceil(maxContextBytes/1024)*5000))`
`DISPATCH_GROOMER_TIMEOUT_MS` env var override preserved (explicit override always wins).
Verification
New test coverage
Env override test confirms explicit `DISPATCH_GROOMER_TIMEOUT_MS` wins over scaled default.