test: reproduce large→small model-switch overflow (GLM-5.2 512k → GPT-5.5 272k) by snoproblem · Pull Request #187 · cortexkit/magic-context

snoproblem · 2026-06-25T05:54:48Z

Bug

When switching from a large-context model to a smaller one mid-session (e.g. GLM-5.2 512k → GPT-5.5 272k), the historian does not reduce the context to fit the new model before sending the prompt, resulting in Input exceeds context window errors.

Root cause

The model-change branch in transform.ts:502-547 detects the switch and clears lastContextPercentage / lastInputTokens to 0. This suppresses every reduction path on the same pass:

Historian trigger (checkCompartmentTrigger): 0% < proactive floor → shouldFire=false
95% emergency block (transform-compartment-phase.ts:328): 0% < 95% → no block
Overflow recovery bump (transform.ts:616-628): needsEmergencyRecovery was just cleared → no bump

The oversized prompt — sized for the old model's window — is sent to the new smaller model and rejected. Recovery only arms on the second pass, after the real overflow error fires the event handler's recordOverflowDetected. The user sees the error on the first request.

Fix direction

In the model-change branch (transform.ts:502), before clearing, compare sessionMeta.lastInputTokens (the old model's last measured input tokens) against the new model's context limit (resolveTrustedContextLimit). When oldInputTokens > newContextLimit, call recordOverflowDetected(db, sessionId, newContextLimit, newModelKey) to arm recovery — which makes the existing :616-628 bump-to-95% path fire on the same pass, so historian + emergency drops run before the prompt is sent.

Reproduction

This PR adds a failing test (transform.test.ts) reproducing the scenario:

Session at 300k input tokens on GLM-5.2 (58% of 512k — no trigger fired)
Switch to GPT-5.5 (272k) — 300k is ~110% of the new window
Asserts needsEmergencyRecovery === true and the oversized tool output is dropped

The test currently fails: needsEmergencyRecovery is false and the tool output stays active.

^{Need help on this PR? Tag /codesmith with what you need. Autofix is disabled.}

Summary by cubic

Adds a failing test in packages/plugin/src/hooks/magic-context/transform.test.ts that reproduces an overflow when switching mid-session from a large-context model (GLM-5.2 512k) to a smaller one (GPT-5.5 272k).
The test expects emergency recovery to arm immediately and drop oversized tool output before send; it currently fails, confirming the model-change branch clears usage and allows an oversized prompt through.

^{Written for commit 7105f3f. Summary will update on new commits.}

Greptile Summary

This PR adds a single failing test that documents the large→small model-switch overflow bug: when a session built up 300k tokens on GLM-5.2 (512k) is switched to GPT-5.5 (272k), the model-change branch in transform.ts clears all pressure state to zero, which suppresses every reduction path so the oversized prompt is sent to the new smaller model unchanged.

The test correctly reproduces the bug mechanism — liveModelBySession holds the new model while the last assistant message still carries the old model, triggering the model-change branch, and the session meta carries lastInputTokens: 300_000 against a 272k new window.
Two structural problems prevent the test from turning green once the fix lands: the test does not write a synthetic models.json with GPT-5.5's context limit (so resolveTrustedContextLimit returns undefined and the fix's overflow comparison never fires), and the OpenCode DB is seeded with id: \"m-raw-assistant-old\" while the live messages array uses id: \"m-assistant-old\" (a cross-reference mismatch that can affect message-index reconciliation and tool-tag eligibility).

Confidence Score: 2/5

The test is safe to merge in isolation — no production code is touched — but as written it will not turn green when the corresponding fix lands, leaving CI in a permanently failing state for this case.

The test correctly identifies the bug and the right observable outcomes, but cannot fulfill its role as a reproducer: without a models.json fixture carrying GPT-5.5's context limit, the fix mechanism gets undefined for the new model and skips arming recovery entirely. A separate OpenCode DB / live-messages ID mismatch could cause the tool-tag assertion to fail for an unrelated reason even if recovery were somehow armed. Both issues need to be resolved before the test can serve as a reliable regression guard.

packages/plugin/src/hooks/magic-context/transform.test.ts — specifically the new describe block starting at line 2551.

Important Files Changed

Filename	Overview
packages/plugin/src/hooks/magic-context/transform.test.ts	Adds a failing regression test for the large→small model-switch overflow bug; has two blocking structural issues: no synthetic models.json mock for GPT-5.5 (so resolveTrustedContextLimit returns undefined and the proposed fix never arms recovery), and an OpenCode DB message ID mismatch (m-raw-assistant-old vs m-assistant-old in the live messages array) that could break tool-tag eligibility checks.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant TF as transform.ts
    participant MC as model-change branch (502-547)
    participant OVF as overflow state (DB)
    participant LCU as loadContextUsage
    participant BUMP as 95% bump path (616-628)
    participant ED as emergency-drop

    TF->>MC: "liveModel=GPT-5.5, lastAssistant=GLM-5.2 → mismatch"
    MC->>OVF: clearEmergencyRecovery()
    MC->>MC: contextUsageMap.delete(sessionId)
    Note over MC: (BUG) no recordOverflowDetected called
    Note over MC: (FIX) resolveTrustedContextLimit(GPT-5.5)<br/>needs models.json mock — returns undefined without it
    MC->>LCU: loadContextUsage → 0%
    LCU-->>TF: "percentage=0%"
    TF->>OVF: "getOverflowState → needsEmergencyRecovery=false"
    Note over BUMP: 0% < 95% but needsEmergencyRecovery=false → no bump
    TF->>ED: skipped — no 95% bump
    Note over ED: tool output stays active → prompt sent oversized → overflow error

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant TF as transform.ts
    participant MC as model-change branch (502-547)
    participant OVF as overflow state (DB)
    participant LCU as loadContextUsage
    participant BUMP as 95% bump path (616-628)
    participant ED as emergency-drop

    TF->>MC: "liveModel=GPT-5.5, lastAssistant=GLM-5.2 → mismatch"
    MC->>OVF: clearEmergencyRecovery()
    MC->>MC: contextUsageMap.delete(sessionId)
    Note over MC: (BUG) no recordOverflowDetected called
    Note over MC: (FIX) resolveTrustedContextLimit(GPT-5.5)<br/>needs models.json mock — returns undefined without it
    MC->>LCU: loadContextUsage → 0%
    LCU-->>TF: "percentage=0%"
    TF->>OVF: "getOverflowState → needsEmergencyRecovery=false"
    Note over BUMP: 0% < 95% but needsEmergencyRecovery=false → no bump
    TF->>ED: skipped — no 95% bump
    Note over ED: tool output stays active → prompt sent oversized → overflow error

_{Reviews (1): Last reviewed commit: "test: reproduce large→small model-switch..." | Re-trigger Greptile}

Greptile also left 3 inline comments on this PR.

…-5.5 272k) When switching from a large-context model to a smaller one mid-session, the model-change branch in transform.ts clears lastContextPercentage / lastInputTokens to 0. This suppresses every reduction path on the same pass: the historian trigger (0% < proactive floor), the 95% emergency block, and the overflow-recovery bump (needsEmergencyRecovery was just cleared). The oversized prompt — sized for the old model's window — is sent to the new smaller model and rejected with 'Input exceeds context window'. Recovery only arms on the SECOND pass, after the real overflow error fires the event handler. The test asserts the expected post-fix behavior: the model-change branch should detect oldInputTokens > newContextLimit and arm emergency recovery (recordOverflowDetected) so the existing 95% bump path runs the historian + emergency drops BEFORE the prompt leaves. Currently fails — the fix lands in transform.ts:502-547.

cubic-dev-ai

No issues found across 1 file

_{Re-trigger cubic}

greptile-apps · 2026-06-25T06:06:32Z

+        const liveModelBySession = new Map<string, { providerID: string; modelID: string }>([
+            [sessionId, { providerID: "openrouter", modelID: "openai/gpt-5.5" }],
+        ]);


Missing models-dev-cache mock for GPT-5.5 — test will remain red after the fix

The proposed fix (per the PR description) detects overflow by calling resolveTrustedContextLimit(newModel.providerID, newModel.modelID, ...) and comparing the result to sessionMeta.lastInputTokens. resolveTrustedContextLimit reads from getSdkContextLimit, which is backed by the models-dev-cache populated from the OpenCode SDK or a persisted models.json file. In this test, neither is present: useTempDataHome points XDG_CACHE_HOME at an empty temp directory, clearModelsDevCache() runs in afterEach, and no synthetic models.json is written with GPT-5.5's 272k limit. getSdkContextLimit("openrouter", "openai/gpt-5.5") will therefore return undefined, the comparison 300_000 > undefined evaluates to false, recordOverflowDetected is never called, and needsEmergencyRecovery stays false — exactly the same outcome as the unfixed code. The useTempDataHome comment (lines 103–108) explicitly documents the pattern: tests that need model-capability lookup should "write a synthetic models.json into <temp>/opencode/models.json". A fixture entry for openrouter/openai/gpt-5.5 with the correct contextWindow value would let getSdkContextLimit return the 272k limit and allow the test to turn green once the fix lands.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

greptile-apps · 2026-06-25T06:06:33Z

+        createOpenCodeDbForTransform(sessionId, [
+            { id: "m-raw-1", role: "user", text: "earlier work" },
+            {
+                id: "m-raw-assistant-old",
+                role: "assistant",
+                text: "old model response",
+                providerID: "openrouter",
+                modelID: "z-ai/glm-5.2",
+            },
+            { id: "m-raw-2", role: "user", text: "continue" },


OpenCode DB message ID mismatches the in-flight messages array

The OpenCode DB is seeded with id: "m-raw-assistant-old" for the assistant row, but the in-flight messages array uses id: "m-assistant-old". Every other test in the file uses the same IDs in both the DB and the live messages so that message-index reconciliation can cross-reference them. Here the reconciler will look for m-assistant-old in the DB, find only m-raw-assistant-old, and treat the in-flight assistant message as unreconciled. Depending on how the historian eligibility / head-tail boundary logic uses the reconciled index, this can cause the tool-output tag to be excluded from eligible history even after the 95% bump fires, so toolTag?.status may stay "active" and the second assertion fails for a reason unrelated to the bug under test.

greptile-apps · 2026-06-25T06:06:34Z

+        const messages: TestMessage[] = [
+            {
+                info: { id: "m-user", role: "user", sessionID: sessionId },
+                parts: [{ type: "text", text: "continue" }],
+            },
+            {
+                info: {
+                    id: "m-assistant-old",
+                    role: "assistant",
+                    providerID: "openrouter",
+                    modelID: "z-ai/glm-5.2",
+                },
+                parts: [
+                    { type: "text", text: "ok" },
+                    { type: "tool", callID: "call-1", state: { output: bigOutput } },
+                ],
+            },


Assistant message in the live messages array is missing sessionID

The user message correctly carries sessionID: sessionId in its info, but the assistant message (m-assistant-old) does not. Several other tests in the file omit sessionID on non-first messages, so this may be harmless if the transform derives the session from the first message. However, the tagger may scope tags by the sessionID found on the message's info, and a missing sessionID on the tool-bearing message could cause the resulting tag to be written without session scope or not be returned by getTagsBySession(db, sessionId) — making the toolTag lookup undefined and the second assertion vacuously false.

snoproblem mentioned this pull request Jun 25, 2026

Historian does not reduce context to fit new model on large→small mid-session model switch (Input exceeds context window) #188

Open

cubic-dev-ai Bot reviewed Jun 25, 2026

View reviewed changes

greptile-apps Bot reviewed Jun 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test: reproduce large→small model-switch overflow (GLM-5.2 512k → GPT-5.5 272k)#187

test: reproduce large→small model-switch overflow (GLM-5.2 512k → GPT-5.5 272k)#187
snoproblem wants to merge 1 commit into
cortexkit:masterfrom
snoproblem:reproduce/model-switch-overflow

snoproblem commented Jun 25, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

greptile-apps Bot Jun 25, 2026

Uh oh!

greptile-apps Bot Jun 25, 2026

Uh oh!

greptile-apps Bot Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

snoproblem commented Jun 25, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bug

Root cause

Fix direction

Reproduction

Summary by cubic

Greptile Summary

Confidence Score: 2/5

Important Files Changed

Sequence Diagram

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

snoproblem commented Jun 25, 2026 •

edited by greptile-apps Bot

Loading