test: reproduce large→small model-switch overflow (GLM-5.2 512k → GPT-5.5 272k)#187
test: reproduce large→small model-switch overflow (GLM-5.2 512k → GPT-5.5 272k)#187snoproblem wants to merge 1 commit into
Conversation
…-5.5 272k) When switching from a large-context model to a smaller one mid-session, the model-change branch in transform.ts clears lastContextPercentage / lastInputTokens to 0. This suppresses every reduction path on the same pass: the historian trigger (0% < proactive floor), the 95% emergency block, and the overflow-recovery bump (needsEmergencyRecovery was just cleared). The oversized prompt — sized for the old model's window — is sent to the new smaller model and rejected with 'Input exceeds context window'. Recovery only arms on the SECOND pass, after the real overflow error fires the event handler. The test asserts the expected post-fix behavior: the model-change branch should detect oldInputTokens > newContextLimit and arm emergency recovery (recordOverflowDetected) so the existing 95% bump path runs the historian + emergency drops BEFORE the prompt leaves. Currently fails — the fix lands in transform.ts:502-547.
| const liveModelBySession = new Map<string, { providerID: string; modelID: string }>([ | ||
| [sessionId, { providerID: "openrouter", modelID: "openai/gpt-5.5" }], | ||
| ]); |
There was a problem hiding this comment.
Missing models-dev-cache mock for GPT-5.5 — test will remain red after the fix
The proposed fix (per the PR description) detects overflow by calling resolveTrustedContextLimit(newModel.providerID, newModel.modelID, ...) and comparing the result to sessionMeta.lastInputTokens. resolveTrustedContextLimit reads from getSdkContextLimit, which is backed by the models-dev-cache populated from the OpenCode SDK or a persisted models.json file. In this test, neither is present: useTempDataHome points XDG_CACHE_HOME at an empty temp directory, clearModelsDevCache() runs in afterEach, and no synthetic models.json is written with GPT-5.5's 272k limit. getSdkContextLimit("openrouter", "openai/gpt-5.5") will therefore return undefined, the comparison 300_000 > undefined evaluates to false, recordOverflowDetected is never called, and needsEmergencyRecovery stays false — exactly the same outcome as the unfixed code. The useTempDataHome comment (lines 103–108) explicitly documents the pattern: tests that need model-capability lookup should "write a synthetic models.json into <temp>/opencode/models.json". A fixture entry for openrouter/openai/gpt-5.5 with the correct contextWindow value would let getSdkContextLimit return the 272k limit and allow the test to turn green once the fix lands.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| createOpenCodeDbForTransform(sessionId, [ | ||
| { id: "m-raw-1", role: "user", text: "earlier work" }, | ||
| { | ||
| id: "m-raw-assistant-old", | ||
| role: "assistant", | ||
| text: "old model response", | ||
| providerID: "openrouter", | ||
| modelID: "z-ai/glm-5.2", | ||
| }, | ||
| { id: "m-raw-2", role: "user", text: "continue" }, |
There was a problem hiding this comment.
OpenCode DB message ID mismatches the in-flight messages array
The OpenCode DB is seeded with id: "m-raw-assistant-old" for the assistant row, but the in-flight messages array uses id: "m-assistant-old". Every other test in the file uses the same IDs in both the DB and the live messages so that message-index reconciliation can cross-reference them. Here the reconciler will look for m-assistant-old in the DB, find only m-raw-assistant-old, and treat the in-flight assistant message as unreconciled. Depending on how the historian eligibility / head-tail boundary logic uses the reconciled index, this can cause the tool-output tag to be excluded from eligible history even after the 95% bump fires, so toolTag?.status may stay "active" and the second assertion fails for a reason unrelated to the bug under test.
| const messages: TestMessage[] = [ | ||
| { | ||
| info: { id: "m-user", role: "user", sessionID: sessionId }, | ||
| parts: [{ type: "text", text: "continue" }], | ||
| }, | ||
| { | ||
| info: { | ||
| id: "m-assistant-old", | ||
| role: "assistant", | ||
| providerID: "openrouter", | ||
| modelID: "z-ai/glm-5.2", | ||
| }, | ||
| parts: [ | ||
| { type: "text", text: "ok" }, | ||
| { type: "tool", callID: "call-1", state: { output: bigOutput } }, | ||
| ], | ||
| }, |
There was a problem hiding this comment.
Assistant message in the live messages array is missing
sessionID
The user message correctly carries sessionID: sessionId in its info, but the assistant message (m-assistant-old) does not. Several other tests in the file omit sessionID on non-first messages, so this may be harmless if the transform derives the session from the first message. However, the tagger may scope tags by the sessionID found on the message's info, and a missing sessionID on the tool-bearing message could cause the resulting tag to be written without session scope or not be returned by getTagsBySession(db, sessionId) — making the toolTag lookup undefined and the second assertion vacuously false.
Bug
When switching from a large-context model to a smaller one mid-session (e.g. GLM-5.2 512k → GPT-5.5 272k), the historian does not reduce the context to fit the new model before sending the prompt, resulting in
Input exceeds context windowerrors.Root cause
The model-change branch in
transform.ts:502-547detects the switch and clearslastContextPercentage/lastInputTokensto 0. This suppresses every reduction path on the same pass:checkCompartmentTrigger): 0% < proactive floor →shouldFire=falsetransform-compartment-phase.ts:328): 0% < 95% → no blocktransform.ts:616-628):needsEmergencyRecoverywas just cleared → no bumpThe oversized prompt — sized for the old model's window — is sent to the new smaller model and rejected. Recovery only arms on the second pass, after the real overflow error fires the event handler's
recordOverflowDetected. The user sees the error on the first request.Fix direction
In the model-change branch (
transform.ts:502), before clearing, comparesessionMeta.lastInputTokens(the old model's last measured input tokens) against the new model's context limit (resolveTrustedContextLimit). WhenoldInputTokens > newContextLimit, callrecordOverflowDetected(db, sessionId, newContextLimit, newModelKey)to arm recovery — which makes the existing:616-628bump-to-95% path fire on the same pass, so historian + emergency drops run before the prompt is sent.Reproduction
This PR adds a failing test (
transform.test.ts) reproducing the scenario:needsEmergencyRecovery === trueand the oversized tool output is droppedThe test currently fails:
needsEmergencyRecoveryisfalseand the tool output staysactive.Need help on this PR? Tag
/codesmithwith what you need. Autofix is disabled.Summary by cubic
Adds a failing test in
packages/plugin/src/hooks/magic-context/transform.test.tsthat reproduces an overflow when switching mid-session from a large-context model (GLM-5.2 512k) to a smaller one (GPT-5.5 272k).The test expects emergency recovery to arm immediately and drop oversized tool output before send; it currently fails, confirming the model-change branch clears usage and allows an oversized prompt through.
Written for commit 7105f3f. Summary will update on new commits.
Greptile Summary
This PR adds a single failing test that documents the large→small model-switch overflow bug: when a session built up 300k tokens on GLM-5.2 (512k) is switched to GPT-5.5 (272k), the model-change branch in
transform.tsclears all pressure state to zero, which suppresses every reduction path so the oversized prompt is sent to the new smaller model unchanged.liveModelBySessionholds the new model while the last assistant message still carries the old model, triggering the model-change branch, and the session meta carrieslastInputTokens: 300_000against a 272k new window.models.jsonwith GPT-5.5's context limit (soresolveTrustedContextLimitreturnsundefinedand the fix's overflow comparison never fires), and the OpenCode DB is seeded withid: \"m-raw-assistant-old\"while the live messages array usesid: \"m-assistant-old\"(a cross-reference mismatch that can affect message-index reconciliation and tool-tag eligibility).Confidence Score: 2/5
The test is safe to merge in isolation — no production code is touched — but as written it will not turn green when the corresponding fix lands, leaving CI in a permanently failing state for this case.
The test correctly identifies the bug and the right observable outcomes, but cannot fulfill its role as a reproducer: without a models.json fixture carrying GPT-5.5's context limit, the fix mechanism gets undefined for the new model and skips arming recovery entirely. A separate OpenCode DB / live-messages ID mismatch could cause the tool-tag assertion to fail for an unrelated reason even if recovery were somehow armed. Both issues need to be resolved before the test can serve as a reliable regression guard.
packages/plugin/src/hooks/magic-context/transform.test.ts — specifically the new describe block starting at line 2551.
Important Files Changed
Sequence Diagram
%%{init: {'theme': 'neutral'}}%% sequenceDiagram participant TF as transform.ts participant MC as model-change branch (502-547) participant OVF as overflow state (DB) participant LCU as loadContextUsage participant BUMP as 95% bump path (616-628) participant ED as emergency-drop TF->>MC: "liveModel=GPT-5.5, lastAssistant=GLM-5.2 → mismatch" MC->>OVF: clearEmergencyRecovery() MC->>MC: contextUsageMap.delete(sessionId) Note over MC: (BUG) no recordOverflowDetected called Note over MC: (FIX) resolveTrustedContextLimit(GPT-5.5)<br/>needs models.json mock — returns undefined without it MC->>LCU: loadContextUsage → 0% LCU-->>TF: "percentage=0%" TF->>OVF: "getOverflowState → needsEmergencyRecovery=false" Note over BUMP: 0% < 95% but needsEmergencyRecovery=false → no bump TF->>ED: skipped — no 95% bump Note over ED: tool output stays active → prompt sent oversized → overflow error%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%% sequenceDiagram participant TF as transform.ts participant MC as model-change branch (502-547) participant OVF as overflow state (DB) participant LCU as loadContextUsage participant BUMP as 95% bump path (616-628) participant ED as emergency-drop TF->>MC: "liveModel=GPT-5.5, lastAssistant=GLM-5.2 → mismatch" MC->>OVF: clearEmergencyRecovery() MC->>MC: contextUsageMap.delete(sessionId) Note over MC: (BUG) no recordOverflowDetected called Note over MC: (FIX) resolveTrustedContextLimit(GPT-5.5)<br/>needs models.json mock — returns undefined without it MC->>LCU: loadContextUsage → 0% LCU-->>TF: "percentage=0%" TF->>OVF: "getOverflowState → needsEmergencyRecovery=false" Note over BUMP: 0% < 95% but needsEmergencyRecovery=false → no bump TF->>ED: skipped — no 95% bump Note over ED: tool output stays active → prompt sent oversized → overflow errorReviews (1): Last reviewed commit: "test: reproduce large→small model-switch..." | Re-trigger Greptile