Skip to content

Modernization continuation: feature activations (tree-sitter AST, hybrid RAG) + tested extractions + reliability fixes#65

Merged
Pterjudin merged 46 commits into
mainfrom
modernize-agentic-editor-foundation
Jun 15, 2026
Merged

Modernization continuation: feature activations (tree-sitter AST, hybrid RAG) + tested extractions + reliability fixes#65
Pterjudin merged 46 commits into
mainfrom
modernize-agentic-editor-foundation

Conversation

@Pterjudin

Copy link
Copy Markdown

Continues the modernization drive (reliability before features; no fake safety; no fake tests; every claim maps to tested code). Branch health: tsgo 0, node common/ suite 888 passing / 0 failing, CDP smoke 11/11.

Highlights

Two inert features ACTIVATED (live-verified via the CDP harness)

  • Tree-sitter AST symbol extraction — was dead code calling a non-existent API (createParser) and never ran Parser.init/Language.load, so extractSymbols always returned [] (RAG was BM25-only on this axis). Rewired to delegate to the editor's ITreeSitterLibraryService (the same service the terminal command parser uses). Live: extracted real symbols from .ts/.py/.go; languages with no shipped grammar (c/swift/kotlin) fall back with no throw.
  • Hybrid RAG via a local Ollama embedding provider — no embedding provider was ever registered with IAiEmbeddingVectorService, so the whole vector pipeline was inert. Added a local Ollama embedding provider routed through electron-main IPC (the renderer can't reach Ollama), opt-in via the new cortexide.rag.embeddingModel setting, registered by a WorkbenchContribution after a startup probe. Privacy-safe (Ollama is loopback = on-machine; allowed under local-only). Live: isEnabled() true and getEmbeddingVector returned a real 768-dim vector.

Real bug fixes

  • Dev-build module-load crash — two export { Type } re-exports the esbuild dev-transpile couldn't elide broke workbench loading; fixed to export type { / inline type (cdp-smoke went from failing to 11/11).
  • Escalation budget bug — tool-error mid-task escalations reset only the tool-error counter, not the iteration counter, so an escalated model inherited a spent budget. Centralized + fixed via a pure computePostEscalationCounters.
  • Plus a fuzz-discovered partialSort k=0 crash and other reliability fixes from earlier in the drive.

Behavior-preserving extractions to pure, node-tested common/ modules

computeModelScore (50k differential fuzz vs a generated oracle + 11 goldens + a 4-lens adversarial-verify workflow), the streaming xmlToolCallScanner (4-lens workflow + differential fuzzes over random chunkings), editStreamRevertDecision, audit parseJsonl + a "Show Audit Log" command, computeMaxTokensForLocalProvider, isLoopbackEndpoint, the treeSitter language map, and more. Each proven byte-identical (textual diff vs HEAD + fuzz vs an oracle) and committed atomically with the full suite measured.

Notes

  • docs/ is removed to match main (PR chore: remove the docs/ folder (internal tracking notes, not product docs) #64 removed it); modernization docs are kept out of the repo per convention.
  • Deliberately deferred (with technical reasons): the global RedactingLogService ILogService wrap (constructed before IConfigurationService; secrets already redacted at source), the SSRF DNS-rebind preflight (renderer can't dns.lookup; needs an electron-main resolve-IPC; building blocks landed), and the AgentLoopController class refactor (risky; the pure decisions are already extracted).

🤖 Generated with Claude Code

Tajudeen and others added 30 commits June 12, 2026 04:05
RepoIndexerService.query()/queryWithMetrics() only ever returned formatted
citation STRINGS ("File: <path>:<lines>\nSymbols...\nContent preview..."), so
the two consumers that need the matched file -- codebaseQueryCommands (Query
Codebase quick-pick) and the agent's search_for_files indexer path -- were
doing URI.file(<whole formatted blob>), producing a bogus URI that opened/paged
nothing.

Add queryStructured(text,k): Promise<RetrievalResult[]> returning the discrete
hits (each with a real file URI). It shares one private _queryWithStructured()
core with queryWithMetrics(): a `structured` array is pushed in lockstep with
the string `results` at both assembly sites (common-query-cache + main path),
threaded through the query/common caches, and returned. query()/queryWithMetrics()
stay byte-identical (string assembly untouched; structured is purely additive).
The context-cache fallback has no per-result URI, so it returns no structured hits.

Point both consumers at r.uri (deduped by file -- a file can match via multiple
chunks/symbols). Pins a structured<->formatted consistency test (the citation
embeds the discrete uri verbatim). tsgo 0; 611 node tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ransaction

multi_edit's replace_all was DEAD for the multi-occurrence case it exists for.
The tool expanded a replace_all edit into N IDENTICAL Search/Replace blocks and
fed them to the span engine (computeSearchReplaceResult), whose findTextInCode
rejects a non-unique ORIGINAL as 'Not unique' -- so block 0 failed immediately
(and even past that, all N identical blocks resolve to the same indexOf span ->
'Has overlap'). So replace_all=true on any string occurring 2+ times applied
nothing. The model-facing prompt was also wrong: it claimed replace_all=false
"replaces only the first", but the engine actually requires the match to be unique.

Fix: a pure, node-tested common/multiEdit.ts computeMultiEditResult(content, edits)
that applies the edits as the standard multi-edit transaction:
  - SEQUENTIAL: each edit operates on the text the prior edits produced.
  - replace_all=false: old_string must be UNIQUE (same indexOf===lastIndexOf test
    as findTextInCode, so a single edit matches edit_file's behavior); else
    'Not unique' / 'Not found'.
  - replace_all=true: replace every left-to-right non-overlapping occurrence (>=1).
  - ALL-OR-NOTHING: validates+rewrites a local copy; the first failure returns
    ok:false with no newContent, so the caller throws before any write.
toolsService.multi_edit now computes the final content via this and applies it
through the existing instantlyRewriteFile diff path (the same one rewrite_file
uses); computeSearchReplaceResult / edit_file are untouched. Prompt corrected to
document the sequential + unique-or-replace_all semantics.

Tests: 15 unit cases + a differential fuzz (20k single verbatim-substring edits +
5k independent full-line multi-edits) proving byte-identical results vs
computeSearchReplaceResult on the inputs the old engine accepted (0 mismatch), so
the swap is non-regressing while additionally fixing replace_all. tsgo 0; 628 node
tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…pic tool-name concat

Four bug-hunt backlog items (re-verified in code):

1. chatLatencyAudit context leak on llmError/abort. onFinalMessage released the
   context (stopping the 60Hz render-monitor interval), but onError and onAbort
   never did -- so every errored or aborted request leaked its context AND kept
   the interval running forever. Release in both.

2. chatLatencyAudit fallover ORPHAN. finalRequestId is `const`, declared before
   the retry loop, so the retry reuses it -- but the model-fallover path started a
   context under a throwaway newRequestId the retry never used, orphaning a context
   (+ interval) that was never released. Re-arm under finalRequestId instead, which
   also restores latency tracking for the fallover attempt (onError released the
   prior attempt's context just before).

3. startBackgroundAgent hidden-thread leak. The hidden thread was added to
   allThreads but never removed, so every background-agent run left a thread object
   behind forever. Confirmed it is NOT user-inspectable (in-memory only, never in
   openTabs; the Running-agents panel renders the record's resultSummary, never the
   thread), and the summary is captured before cleanup -> drop the thread in a
   .finally() once the run settles (completed / errored / cancelled).

4. Anthropic streamed tool-name concat. content_block_start did
   `fullToolName += name` per tool_use block, so parallel tool calls concatenated
   into a garbage streamed name like "read_filelist_dir"; finalMessage already uses
   tools[0]. Keep only the first block's name.

All four are in the browser agent loop / electron-main provider (excluded from the
node runner), so they are verified by code reasoning, not unit tests; chatLatencyAudit
releaseContext is idempotent and the lifecycle is now release-on-every-terminal-exit.
tsgo 0; 628 node tests still pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…patch

Add user-invokable Skills, mirroring Claude Code's Agent Skills and the sibling
custom-agents feature. Each `.cortexide/skills/<name>/SKILL.md` (Markdown +
optional name/description frontmatter) becomes a `/<skill-name> [args]` slash
command in chat: invoking it expands the skill's instruction body (plus the
user's args) into a normal chat turn.

Pure testable core (common/cortexideSkillsService.ts, mirrors cortexideAgentsService):
  - parseSkillFile(dirName, text, uri): frontmatter + instruction body, no YAML dep.
  - parseSkillInvocation(input): "/name args" -> { name (lower-cased), args } | null.
  - buildSkillInvocationMessage(skill, args): the expanded turn text.
  - CortexideSkillsService: discovers each sub-dir of .cortexide/skills that holds a
    SKILL.md (recursive FS watch, 64 skills / 64 KB caps), getSkill() by name.

Wiring: chatThreadService imports the service (registering the singleton) and
exposes getSkillExpansion(input) + listSkillNames(); the chat input's slash handler
(SidebarChat) dispatches in its default case AFTER the built-in commands (so a
built-in like /help is never shadowed by a same-named skill), and /help now lists
the available skills.

Tests: 16 pure unit cases (parse / invocation / message-build, incl. CRLF, quoted
frontmatter, multi-line args, lone-slash, dir-name default). tsgo 0; buildreact
clean; 644 node tests pass. The discovery service (FS watch) and the React slash
handler are browser-layer (not node-testable); verified by tsgo + buildreact +
the pure-core tests -- a live CDP drive needs a workspace skill file (follow-up).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The "what can leave my machine" report covered model dispatch, catalog refresh,
web tools, embeddings, vector store, MCP, and update-check -- but NOT product
telemetry, even though it's a real off-machine channel. Add it as a 7th channel.

Its status is computed via the telemetryConsent SSOT (isTelemetryEnabled), using
the SAME local-only resolution the electron-main metrics gate uses
(routingPolicy==='local-only' OR localFirstAI), so the report can't disagree with
what actually ships:
  - opt-IN by default: OPT_OUT_KEY absent/'true' -> 'not-configured' (nothing sent)
  - explicitly opted in ('false') and not local-only -> 'open'
  - opted in but local-only -> 'blocked' (forced off, with a reason)

The privacy-report command reads OPT_OUT_KEY from IStorageService at APPLICATION
scope (shared with main) and threads it into the report config. Added 'telemetry'
to EgressModality (+ a canEgress case to keep the switch exhaustive/total).

Tests: a dedicated telemetry-channel test (default off / opt-in open / local-only
forced off via both routingPolicy and localFirstAI) + updated the minimal-report
count. tsgo 0; 645 node tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
mcpChannel._addUniquePrefix prepended `Math.random().toString(36)` to every MCP
tool name -- so each tool got a DIFFERENT prefix and all of them changed on every
reconnect/reload, meaning the model never saw a stable tool name across sessions
(and tools from one server shared no common prefix).

Replace it with a pure, deterministic, server-keyed prefix (new common/
mcpServiceTypes.mcpToolNamePrefix, FNV-1a -> 6 chars of [0-9a-z]). All tools from
one server now share one stable prefix. Safe because routing uses the separate
serverName (not the prefix) and the prefix is '_'-free, so the call-time strip
removeMCPToolNamePrefix still recovers the original tool name. Threaded serverName
through the 5 call sites.

Tests: determinism, distinctness across server names, 6-char base36 / no '_',
and round-trip through removeMCPToolNamePrefix for tool names containing
underscores. tsgo 0; 649 node tests pass. (mcpChannel is electron-main / not in
the node runner; the prefix logic is the pure tested fn.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…o tested code

CortexIDE's positioning makes concrete promises. This test ties each marketing-level
claim to the pure module that actually enforces it, so a regression that would make a
claim FALSE fails a test named after the claim (traceability layer over the detailed
module tests):

  - "local-first / private"        -> buildEgressReport: local-only opens 0 off-machine channels
  - "telemetry is opt-IN"          -> isTelemetryEnabled: off by default, local-only forces off
  - "never leaks a secret"         -> detectSecrets: flags + redacts an API key
  - "no SSRF to internal/metadata" -> canEgress+classifyDestination: loopback ok, metadata/private/
                                      hex-IPv4-mapped-IPv6 blocked
  - "dangerous actions are gated"  -> classifyCommandRisk: `rm -rf /` requiresApproval
  - "resistant to prompt injection"-> wrapUntrustedContent: fences + neutralizes a forged end-marker
  - "model-agnostic failover"      -> buildFailoverCandidates: failed local model -> configured cloud model

tsgo 0; 656 node tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e does NOT govern

The privacy report listed CortexIDE's own channels but said nothing about egress
inherited from the VS Code platform -- which risks implying local-only blocks
literally everything. Add a PLATFORM_INHERITED_EGRESS notes section (the Phase 8
egress-leak audit's secondary findings) rendered under a clear heading:
  - webview UI assets from the VS Code CDN (vscode-cdn.net)
  - the built-in GitHub Copilot chat agent's own endpoints (if enabled)
  - the on-demand "curl | sh" local-model installer (only on explicit click)

These are out of CortexIDE's routing control (or fire only on user action); naming
them keeps the report honest rather than over-claiming. Added platformNotes to the
report + a test. tsgo 0; node tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…mmon module

The decision that gates whether a tool (edit/terminal/MCP) runs WITHOUT user
confirmation -- the core "never silently does something destructive" boundary --
was an inline mutated `shouldAutoApprove` boolean in chatThreadService with ZERO
tests. Extract it to pure common/autoApprovePolicy.ts (computeAutoApproveBaseline +
decideAutoApprove) preserving the EXACT override order: catastrophic-command
hard-block -> setting baseline ('edits' defaults true) -> dangerous-command force
approval -> cwd-escape force approval -> YOLO NL-safe approve -> HIGH-risk-edit
force approval -> YOLO edit approve.

The caller computes the scalar inputs (classifyCommandRisk, the NL heuristic,
scoreEdit) then makes ONE decision; telemetry / the hard-block message / the
auto-apply notification stay inline, gated on the decision's per-rule flags so they
fire under identical conditions with identical payloads. The NL+edit telemetry keeps
its original try/catch swallow; the terminal telemetry stays un-wrapped (parity).

Tested: golden table over every override + a 30k differential fuzz that re-implements
the OLD inline mutation order independently and asserts the extracted decision matches
on every input combination (0 mismatch). A 3-lens adversarial verification workflow
(telemetry / control-flow / computation-order) confirmed full side-effect + control-flow
parity; the one exception-propagation delta it found is now fixed. tsgo 0; 672 tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tested core

The edit risk/confidence score gates auto-apply (HIGH can never be silently
auto-approved; YOLO keys its silent-apply threshold off the 0.2/0.7 boundaries), but
EditRiskScoringService.scoreEdit was untested. Extract everything except factor #6
(the count of pre-existing Error markers, which needs the live model/markers) into
pure common/editRiskScore.ts scoreEditFromContext(context, existingErrorCount). The
service now computes only that count -- passing 0 when the model is unavailable or
there's no newContent, mirroring the old `if (model && newContent)` guard -- and
delegates. EditContext/EditRiskScore types moved to the pure module and re-exported
from the service so existing import paths are unchanged. Byte-identical logic.

Tested: factor-by-factor golden cases (deletion=HIGH+1.0, critical +0.5, >50% rewrite
crosses HIGH, test-file +0.2, multi-file cap, >5-errors +0.2, create floor 0.05, tiny
edit very-low) + the LOW/MEDIUM/HIGH classifier boundaries incl. the silent-auto-apply
boundary (riskScore<0.2 AND confidence>0.7). tsgo 0; 685 tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… fuzz)

findDiffs is the line-diff engine every apply/accept/reject hunk derives from, and
it had no test. It is node-importable as-is (only the pure diffLines bundle + a type).
Add golden cases pinning each hunk type + the internal trailing-\n bookkeeping
(insertion / deletion / edit / identical / "E vs E\n is an insertion" / empty
old+new), plus a 15k reconstruction property fuzz: re-applying every returned hunk to
the old text must rebuild the new text EXACTLY. The reconstruction oracle is
sanity-checked against the goldens before being used. Test-only; tsgo 0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…re, tested)

editCodeService.acceptDiff / rejectDiff carried the off-by-one-prone line/range math
inline + untested. Extract it to pure common/perHunkAccept.ts:
  - computeAcceptedOriginalCode(originalCode, diff): folds an accepted hunk into the
    diff-area baseline (deletion/insertion/edit splices), byte-identical to acceptDiff.
  - computeRejectWrite(diff, diffAreaEndLine) -> {writeText, toRange}: the write+range
    that undoes a hunk, incl. the two end-of-zone rounding cases (deletion past the zone
    end, insertion of the final newline). Range is a plain IRange-shaped object (no editor
    import); the service passes it straight to _writeURIText.

Tested: golden splices + golden range math for every variant and both end-of-zone
boundaries, PLUS a 12k accept-convergence property fuzz -- repeatedly accepting the
first hunk and re-diffing (exactly as the service does) must fold originalCode all the
way to the new code. That sequential-accept path is where the boundary off-by-ones
live. tsgo 0; 704 tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ead corpus

The streaming SEARCH/REPLACE parser (extractSearchReplaceBlocks) already had golden
+ a single fixed streaming-monotonicity test. Add a randomized streaming fuzz: 2k
random multi-block SR strings fed prefix-by-prefix must never regress (block count
non-decreasing, per-block state never goes done->writingFinal->writingOriginal), and
the complete stream parses to exactly N done blocks.

The fuzz surfaced a real parser CONTRACT: a block with an EMPTY ORIGINAL is mis-parsed
(the `=======` on the line right after `<<<<<<< ORIGINAL` is swallowed as content),
while an empty UPDATED parses fine -- pinned as an explicit test so callers know ORIGINAL
must be non-empty (realistic: you never search for nothing). Also deleted the ~200-line
commented-out test corpus from the source (it lives in the runnable suite now). tsgo 0;
706 tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…hes logs" path)

RedactingLogService held the redact-message/redact-args logic inline, reachable only
through its injected ISecretDetectionService (not node-constructible), so the secret-
in-logs guarantee was effectively untested at the log layer. Extract it to pure
common/logRedaction.ts (redactLogMessage + redactLogArgs over a SecretDetectionConfig,
built on the already-pure detectSecrets/redactSecretsInObject). The service now
delegates to these -- byte-identical, since its injected service's detectSecrets/
redactSecretsInObject are thin wrappers over the same free functions + getConfig().

Tested: redacts API keys out of message + string args + deeply-nested object args;
clean lines pass through; non-string/object args untouched; disabled config is a
pass-through. (The production ILogService DI swap remains the deferred cdp-only item.)
tsgo 0; 713 tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…edToolProperties

toAnthropicTool and toOpenAICompatibleTool built their `paramsWithType` map with the
identical inline loop, and toGeminiFunctionDecl built an equivalent one by hand -- the
"every property gets a JSON-Schema type" contract was duplicated 3x (and a type-less
variant once shipped, fixed in ff1718a). Extract the one pure
buildTypedToolProperties(params) into common/providerToolFormat.ts; OpenAI + Anthropic
use it directly, Gemini maps its typed properties to the SDK's Type.STRING at the
electron-main boundary. Byte-identical output; tsgo confirms the `satisfies
Anthropic.Messages.Tool / FunctionDeclaration` assertions still hold.

Tested: empty/single/multi params, every property typed (the regression guard), input
not mutated, and the OpenAI tool embeds the typed properties. tsgo 0; 719 tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ol-capture reducers

The OpenAI non-streaming response handler and the Gemini streaming chunk loop captured
the model's text + tool call with inline logic in electron-main (untestable in the node
runner). Extract the pure cores to common/providerToolFormat.ts:
  - extractToolCallFromNonStreamingChoice(choice): {empty, hasToolCall, text, name,
    args, id}. The caller keeps the original `if (toolCalls.length>0)` guard via
    hasToolCall, so a prior streaming attempt's tool vars aren't clobbered.
  - reduceGeminiChunk(state, chunk): text appends, a functionCall REPLACES (last wins,
    unlike OpenAI's concatenation) -- byte-identical to the inline loop.
  - finalizeGeminiToolId(toolId, uuidGen): the empty-id -> generated-id fallback.

Tested: OpenAI missing-choice/empty, content-no-tools, one tool_call, nullish-coerce,
first-only; Gemini text-append, functionCall-capture, last-wins replacement, undefined
args -> "{}", no-op chunk; and the id fallback. tsgo 0; 729 tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
….ok(true))

rollbackSnapshotService.test.ts was 4 vacuous assert.ok(true) placeholders. Extract
the snapshot byte budget into pure common/snapshotBudget.ts (snapshotFileBytes +
planSnapshot greedy include-until-overage) and have the service read-then-plan. The
included set + skipped flag are identical to the old streaming loop (only a past-budget
file may now be read before exclusion -- harmless for a pre-edit snapshot). Replace the
placeholders with real golden tests: all-fit, over-budget truncation at the boundary,
exactly-at-budget (strict > ), empty, single-oversized, and greedy "skip stops scanning".
The model/file reads stay in the service (not node-testable). tsgo 0; 732 tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…(true))

autostash.flow.test.ts was 4 vacuous placeholders. restoreStash + dropStash both
parsed `stash@{N}` inline identically; extract that to pure common/gitStashRef.ts
parseStashIndex (defaults to 0 / latest on a missing or malformed ref, byte-identical),
have both sites delegate, and replace the placeholders with real tests: well-formed
indices, malformed/empty -> 0, embedded ref. The stash create/restore/drop flows need
the live git command service (not node-testable). The placeholder's "dirty-only mode
skips stash" case is dropped -- createStash has no such mode (would have asserted dead
behavior). The inferred isBenignStashFailure helper does not exist in the code, so it
was not invented. tsgo 0; 735 tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…der test

applyAll.rollback.flow.test.ts was 3 assert.ok(true) placeholders. Its meaningful
scenarios are ALREADY covered by REAL tests in applyEngineV2.test.ts with a concrete
MockRollbackSnapshotService: "on apply failure, snapshot restore is called" by the
atomicity test (:237, asserts createdCount/restoredIds/no-discard) and "success path
discards snapshot" by the snapshot-lifecycle test (:295). The third placeholder
("snapshot skipped -> git restore invoked") asserted nothing, so deleting loses no
coverage. Removing fake tests (false confidence) serves the no-fake-safety goal.
Follow-up: a REAL skipped-snapshot -> git-stash-restore test in applyEngineV2.test.ts
(needs the mock's createSnapshot to return skipped=true) is a genuine remaining gap.
Measured full suite: 728 passing, 0 failing; tsgo 0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ert.ok(true))

auditLog.append.p0.test.ts was 3 vacuous placeholders. Extract the audit log's on-disk
format + rotation policy into pure common/auditLogFormat.ts and have AuditLogService
delegate (byte-identical):
  - serializeEvents(events): JSONL -- one compact JSON object per line + trailing \n
  - shouldRotate(currentSize, addBytes, rotationSizeMB): strict-> MB threshold
  - rotatedLogPath(jsonlPath, n, compressed): audit.jsonl -> audit.<n>.jsonl[.gz]
The append/flush/file I/O stays in the service (not node-testable). Replace the
placeholders with real tests: JSONL shape + round-trip, the strict-> rotation boundary,
and the rotated-name format (only trailing .jsonl rewritten). Completes the
placeholder-test cleanup (snapshot/gitStash/auditLog now real; applyAll.rollback deleted).
Measured full suite: 732 passing, 0 failing; tsgo 0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…alation deciders

checkEarlyTokenQuality + shouldUseSpeculativeEscalation were already pure but untested;
they gate abandoning a streaming response mid-flight for a stronger model. Add a test
file (no source change): <20 tokens -> neutral 0.5/no-escalate, repetition penalty,
generic-refusal+error -> score < 0.5 AND >=50 tokens -> escalate, the escalate requires
BOTH conditions, incomplete code fence penalty, a clean balanced-fence response stays
1.0; and the speculative truth table (confidence 0.59 vs 0.6 boundary, qualityTier
'escalate' forces it). tsgo 0; measured full suite 741 passing, 0 failing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Extract RepoIndexerService._partialSort (the O(n log k) top-k-by-score min-heap behind
the BM25 rerank) verbatim to pure common/partialSort.ts; the service delegates. Add a
differential fuzz: the SET of scores it returns must equal a full sort's top-k (it
tolerates 0.1 ties, so compare the score multiset, not order) over 20k random arrays
with k spanning 0..n+3.

The fuzz surfaced a REAL latent crash: with k === 0 and a non-empty input the heap
stayed empty and `heap[0].score` threw. k === 0 is reachable (the rerank pool size
Math.min(k*3, ...) is 0 when a caller passes k=0). Fixed: k <= 0 -> [] (byte-identical
for the k > 0 values used in practice). Measured full suite: 745 passing, 0 failing; tsgo 0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e RAG ranking)

BM25/keyword retrieval is the only live RAG path (embeddings are gated/optional), so
its scoring is what ranks the code context the model sees -- and it was untested.
Extract tokenize / scoreEntry / naiveScore verbatim from RepoIndexerService
_tokenize / _scoreEntryFast / _score into pure common/bm25Score.ts; the service keeps
its tokenization LRU cache and delegates the math. ScorableEntry is the structural
subset of IndexEntry the scorer reads.

Tested: tokenize (lowercases, splits on non-[a-z0-9_], underscores stay in-token),
the relevance ranking exact-symbol(10) > partial(4) > token-only(2) > no-match(0),
case-insensitive symbol match, URI binary +3, snippet-overlap cap at 5, snippet phrase.
The tests pinned two real (byte-identical) quirks: tokenize does NOT split underscores,
and naiveScore does not lowercase (uppercase chars act as separators). Measured full
suite: 755 passing, 0 failing; tsgo 0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The ls_dir tree renderer the agent reads to navigate the workspace was pure but
untested. Add a test (no source change): the no-children -> "is not a directory"
Error branch; first-page header + one entry per line with the directory trailing-slash
and "(symbolic link)" markers; the header omitted on a non-first page; and hasNextPage
appending a "(N results remaining...)" elbow line while the last shown entry keeps the
tee prefix. Box-drawing prefixes are asserted via \u escapes so the source stays ASCII.
Measured full suite: 759 passing, 0 failing; tsgo 0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…timator

estimateQualityTier biases model selection (cheap_fast / standard / escalate) before
any capability scoring, so its boundaries are a routing contract -- and it was inline +
untested in the 1000-line ModelRouter. Move it (and the QualityTier type, which has no
external importers) to pure common/routing/qualityTier.ts over a structural
QualityTierContext; ModelRouter re-exports the type and calls the pure fn (the private
method is removed, its single caller updated). Byte-identical.

Tested (golden table): simple+no-media -> cheap_fast; images/PDFs demote a simple
question to standard; complex-reasoning/multi-step/security/>100k-context -> escalate;
the 100k context-size boundary is strict (100000 == standard, 100001 == escalate);
ordinary -> standard. Measured full suite: 765 passing, 0 failing; tsgo 0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Both hybrid rerank variants (_rerankHybrid with computed cosine similarity,
_rerankHybridWithVectorStore with vector-store lookups) shared the exact same
normalize+blend math; only the per-item vector-score SOURCE differed. Extract one pure
common/hybridRerank.blendScores(items, vectorScoreOf, weights); both callers pass their
vector-score closure. This decouples the fragile, object-identity docId derivation (kept
in the caller, documented: a chunk copy would yield indexOf=-1 and silently drop the
vector signal -- not triggered today since the chunk reference is preserved) from the
pure blend. Byte-identical, including the score clamp.

Tested: min-max normalization clamped to include 0/1, weighted blend, a vector score
lifts a low-BM25 item, missing-vector -> BM25-only, the documented dead 0.5 fallback
(all-equal positive scores normalize to 1.0 not 0.5 due to the 0-floor clamp), no input
mutation. Measured full suite: 772 passing, 0 failing; tsgo 0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…memory storage stub

The routing evaluation loop (win-rate / escalation-rate / per-model success-rate that
feeds learned routing) was untested. Add a test that runs the REAL service against a
10-line Map-backed IStorageService stub (no source change): getModelSuccessRate returns
the neutral 0.5 with no data and successes/total otherwise (keyed provider:model);
getQualityReport win/escalation/retry rates + avgLatency over the recent window;
modelPerformance map keyed provider:model with per-model count/successRate; and the
last-100 window (150 recorded -> only the last 100 count). Measured full suite: 778
passing, 0 failing; tsgo 0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bring Skills to parity with custom agents and harden the parser (all additive; dispatch
unchanged):
  - parseSkillFile now reads allowed-tools / allowed_tools / tools into Skill.allowedTools
    (same tokenizer as parseCustomAgentFile; undefined when absent/empty). Enforcement at
    dispatch is a separate, not-yet-wired step.
  - parseSkillInvocation name is tightened to a single [A-Za-z0-9_-] run, so a file path
    (`/src/foo.ts`), a doubled slash (`//x`), or `/a.b` is no longer mistaken for a skill.
  - RESERVED_SLASH_COMMANDS + isReservedSkillName (skills must not shadow built-ins).
  - dedupeSkillsByName (first-wins, case-insensitive, reports conflicts), wired into the
    loader so the skill set is unambiguous (matches getSkill's first-match lookup).

Tested: allowed-tools/allowed_tools/tools parsing + undefined cases; the tightened
invocation (rejects paths/doubled-slash, accepts hyphen/underscore/digit names);
reserved-name detection; first-wins dedupe with conflict reporting. Measured full suite:
784 passing, 0 failing; tsgo 0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…matcher

Add contract tests to searchReplaceMatch (test-only): callers always feed getValue(LF),
so these document what would happen if CRLF ever reached the matcher -- a single-line LF
needle is still found in CRLF source, a MULTI-line LF needle misses the exact match (the
CR breaks it), and only the whitespace fallback recovers it by stripping CR. Plus the
sequential-compose contract that the multi_edit transaction relies on: applying B to the
output of A resolves text A produced, and an ORIGINAL that A CONSUMED returns Not found
(no stale edit against the pre-edit content). Measured full suite: 789 passing, 0
failing; tsgo 0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ntract)

Whether the indexer may compute embeddings is a privacy decision: embeddings go to an
opaque, unclassifiable provider, so under local-only mode they must be blocked (BM25
fallback). Extract RepoIndexerService._canComputeEmbeddings's pure decision to
common/embeddingsGate.canUseEmbeddings({hasEnabledProvider, routingPolicy, isOffline});
the service reads the three live inputs and delegates. Byte-identical (canEgress import
moved out of the browser service since it is now used only by the pure gate).

Tested (fail-closed truth table): no provider -> false; local-only -> false even with a
provider + online; offline -> false; all-clear -> true. Measured full suite: 793 passing,
0 failing; tsgo 0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Tajudeen and others added 16 commits June 14, 2026 22:46
…nd building block)

Add the raw-IP classifiers needed for an SSRF DNS-rebind preflight: resolve a hostname
to its IP, then classify THAT IP so a name pointing at loopback/private/cloud-metadata
is caught even though the hostname looked public. classifyResolvedAddress(ip) reuses the
exact IPv4 / IPv4-mapped-IPv6 (dotted AND hex-canonicalized) / IPv6 rules; a bare hostname
is 'unknown'. isPrivateResolvedIP(ip) = loopback || private (what an SSRF guard blocks).
classifyDestination's IP tail now delegates to it (DRY, byte-identical -- the 32 existing
egressPolicy/SSRF-parity tests still pass).

Tested: loopback/private/metadata/link-local/ULA incl. the hex IPv4-mapped form
(::ffff:7f00:1, ::ffff:a9fe:a9fe), public -> remote, bare hostname/empty -> unknown, and
the isPrivateResolvedIP block set. (The async dns.lookup + per-redirect-hop wiring on the
web-tool fetch stays the deferred cdp-only item.) Measured full suite: 797 passing, 0
failing; tsgo 0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…to the tested detector

The auto-mode failover router rebuilt isCodebaseQuestion with a hand-written regex that
had DRIFTED to a narrow subset of the tested looksLikeCodebaseQuestion (already used for
the INITIAL routing + imported here) -- so the same message could be classified one way
initially and another on failover, steering the failover model differently. Replace the
inline regex with looksLikeCodebaseQuestion(content) so both paths agree (it is a strict
superset: its first two patterns ARE the inline ones), and drop the now-unused lowerMessage.

Added a PARITY test pinning that every phrasing the old inline regex matched is still
detected. Measured full suite: 798 passing, 0 failing; tsgo 0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…l (+ differential fuzz vs the original)

#12b. The ~680-line capability-scoring arithmetic that decides which model Auto picks lived in the
private ModelRouter.scoreModel and was untestable. Extracted it VERBATIM into a pure, node-tested
common/routing/computeModelScore.ts; ModelRouter.scoreModel is now a thin wrapper that resolves the
impure inputs (getCachedCapabilities, realParamSize from settingsState, this.isVisionCapable, the
freeTierQuotaService lookup, the two globalSettings reads) and delegates.

Proven byte-identical to git HEAD: a whitespace-normalized diff of the extracted scoring body against
HEAD's scoreModel yields ZERO content differences modulo the 5 documented input substitutions (the
injected capabilities/realParamSize/isVisionCapable/getFreeTierRemaining + routingPolicy/localFirstAI
scalars, and 3 em-dash->'--' hygiene edits). A 4-lens adversarial-verify Workflow independently
re-derived this (all lenses: byte-identical; blast-radius clean; oracle is a true independent copy).

Validation: 50k differential fuzz vs a GENERATED oracle (test/common/computeModelScore.oracle.ts =
HEAD's scoreModel body, script-extracted with the same substitutions -> catches any future drift of
the extracted fn from the original) + 11 hand-traced goldens pinning each scoring axis (quality tier,
privacy, code/codebase-question, local-first heavy/light, low-latency, vision, free-tier exhaustion)
+ invariants (auto->0, score>=0, quota-throw swallowed). Removed now-unused imports from modelRouter.
tsgo 0; common suite 798 -> 814 (+16); hygiene clean (0 non-ASCII on added src lines).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…scalations now reset nMessagesSent too

#21. A successful mid-task model escalation (tryEscalateModel) hands the SAME task to a fresh, more
capable model, which must start with a clean per-attempt budget. The iteration-cap escalation site
(chatThreadService ~3552) reset BOTH nMessagesSent and consecutiveToolErrors, but the two tool-error
escalation sites (~4747 unparseable, ~4865 failed-tool-calls) reset only consecutiveToolErrors -- so a
tool-error-escalated model silently INHERITED a spent iteration budget and could hit the iteration cap
and stop before it had a fair chance to finish (e.g. a weak local model that burned 25/30 steps then
failed tool calls would hand the strong cloud model only 5 steps). The total work stays bounded by the
unchanged global escalationCount cap (MAX_MODEL_ESCALATIONS), exactly as the iter-cap site already
accepted.

Fix: centralize the reset in pure common/agentLoopDecisions.ts computePostEscalationCounters(triggerSite)
(section 2, escalation owner) -> {nMessagesSent:0, consecutiveToolErrors:0} uniformly; route all three
reset sites through it. Golden table pins the contract that every trigger site resets BOTH counters and
that the two sites agree (regression guard against the old non-uniformity). The global escalation budget
is intentionally NOT reset (it bounds total cross-model work). The llmError escalation path (resets its
own nAttempts) is a different loop and out of scope.

tsgo 0; common suite 814 -> 817 (+3). The chatThreadService wiring is browser-layer (not node-testable);
the reset logic + its uniformity are node-tested in the pure fn, and the wiring is type-checked.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… log (+ truncation tolerance)

#20b. The write side (serializeEvents/shouldRotate/rotatedLogPath) already lives in pure
common/auditLogFormat.ts; the inverse was missing. Added parseJsonl(content) -> { events, skipped }:
one JSON object per line, blank/whitespace-only lines ignored (serializeEvents always trails a newline),
and a non-blank line that fails JSON.parse is SKIPPED + counted rather than thrown. The audit log is
append-only and can be cut mid-write by a crash, so a single truncated trailing line must NOT lose the
whole tamper-evident record -- every well-formed line before the corruption is still recovered.

This is the read-side building block the deferred audit-view/export will consume (the file read itself
stays in AuditLogService); it also pins the serialize<->parse round-trip bidirectionally. Tests: golden
round-trips, a truncated trailing line (skipped=1, prior events survive), a corrupt MIDDLE line (later
valid lines still parse), blank/whitespace tolerance, empty content. The pre-existing round-trip test now
delegates to parseJsonl instead of a hand-rolled split/JSON.parse.

tsgo 0; common suite 817 -> 822 (+5); hygiene clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…+ pin it

#25b. TreeSitterService._getLanguageFromUri held an inline extension->tree-sitter-language map and was
browser-only/untestable. Extracted it VERBATIM to pure node-tested common/treeSitterLanguageMap.ts
(TREE_SITTER_LANGUAGE_BY_EXTENSION + languageIdFromPath(path)); the private method now delegates. Tree-
sitter activation is still deferred (the parser load is a separate item), but this map decides which
grammar a file routes to once it is wired, so pinning it keeps that routing stable.

Byte-identical: same lower-cased trailing-segment logic (`path.split('.').pop()?.toLowerCase()`), same
15 entries, same `map[ext] || null` fallback. Golden table covers every mapped extension + a no-drift
assertion on the exported map size; plus case-insensitivity, multi-dot paths (last segment), unknown/
missing/trailing-dot -> null, and the dotfile case (/.gitignore -> null).

tsgo 0; common suite 822 -> 828 (+6); hygiene clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e tested isAutoModelSelection

Wave 4 leftover. chatThreadService computed "is this the Auto selection?" by hand at two sites:
~5621 `userModelSelection?.providerName === 'auto' && userModelSelection?.modelName === 'auto'` and
~3904 the broader `!modelSelection || (modelSelection.providerName === 'auto' && ...modelName === 'auto')
|| (Chat-feature selection is auto)`. The exact `=== 'auto' && === 'auto'` predicate already exists as
the exported isAutoModelSelection(selection: ModelSelection | null) in cortexideSettingsTypes, but it
was untested and not used here.

Routed both sites through isAutoModelSelection (byte-identical: the helper IS `sel?.providerName ===
'auto' && sel?.modelName === 'auto'`, so site A == isAutoModelSelection(userModelSelection), and site B
keeps its site-specific `!modelSelection ||` + Chat-feature OR, each auto-term now a helper call; the
two synchronous reads of the immutable settings snapshot collapse to one without changing the value).
Added a test suite pinning isAutoModelSelection (literal auto, null, concrete, half-auto demands both
fields, complementary with isValidProviderModelSelection) so the now-shared contract can't drift.

tsgo 0 (confirms both call sites are ModelSelection|null); common suite 828 -> 833 (+5); hygiene clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…Grammar (+ fragment-stream tests)

#6 (the riskiest extraction -- a streaming parser with cross-chunk buffering). extractXMLToolsWrapper
turns a model's streamed XML ("<read_file><uri>/a</uri></read_file>") into a structured tool call while
hiding the markup from the user-visible text. Models without native function-calling (every local model
in practice) rely on it, so it is load-bearing for agentic use -- and it lived in electron-main, untested.

Extracted the state machine + its 4 private helpers (findPartiallyWrittenToolTagAtEnd, findIndexOfAny,
parseXMLPrefixToToolCall, trimBeforeAndAfterNewLines, ToolOfToolName) VERBATIM into pure node-tested
common/xmlToolCallScanner.ts as createXmlToolCallScanner(tools, toolId) -> { push(accumulatedFullText),
trimAndGetFinal() }. extractXMLToolsWrapper is now a thin adapter keeping the onText/onFinalMessage
plumbing + tool-set resolution (availableTools) and delegating; toolId is injected (generateUuid) so the
scanner is deterministic. Removed the now-unused imports (SurroundingsRemover/RawToolCallObj/
RawToolParamsObj/ToolName/ToolParamName); extractReasoningWrapper is untouched.

Behavior-preserving: the 4 helpers + push body are byte-identical to git HEAD (the only deltas are the
param rename params.fullText->accumulatedFullText, the trailing onText({...}) replaced by a return the
adapter re-adds, and two dropped commented-out console.logs). A 4-lens adversarial-verify Workflow
confirmed byte-identical via THREE independent differential fuzzes that drove HEAD's verbatim wrapper vs
the compiled scanner+adapter over random inputs x random chunkings (60k + 23k + 6.7k cases, 0 mismatches),
including the first-tag-by-tool-order latching quirk and the onFinalMessage double-call timing. The one
real finding it surfaced (a test-soundness gap: only finalized displayText was asserted, never an
intermediate push() return -- which is what onText forwards live) is fixed by a mid-stream displayText
test that catches mid-stream markup-hiding regressions (e.g. dropping the partial-tag buffer).

Tests: complete calls, fragment splits, char-by-char convergence, monotonic growth, 9 malformed inputs
(never throw, one-shot + fragmented), JSON-blob passthrough, unknown-tag passthrough, first-tag latching.
tsgo 0; common suite 833 -> 855 (+22); hygiene clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…n the budget table

Wave 5 (a testable portion of the provider send-path). sendLLMMessage.impl.ts decided the per-call
output-token budget (OpenAI max_tokens / Ollama num_predict) inline and untested, yet it is a real
responsiveness lever: local models are slow per token, so autocomplete asks for a tiny budget (96, fast
suggestions), quick edits (Ctrl+K / Apply) a medium one (200), and cloud calls a flat 300. Used by the
FIM, Ollama-FIM, and Ollama-chat paths.

Extracted computeMaxTokensForLocalProvider VERBATIM to pure node-tested common/localProviderMaxTokens.ts;
the 3 call sites now import it. The surrounding SDK/streaming/abort plumbing stays in electron-main (only
CDP-testable). Golden table pins every branch: cloud->300 regardless of feature, local Autocomplete->96,
local Ctrl+K/Apply->200, local Chat/SCM/unknown/undefined->300.

tsgo 0; common suite 855 -> 859 (+4); hygiene clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ted isLoopbackEndpoint

Wave 5 (a testable portion of the provider send-path). sendLLMMessage.impl.ts decided "is this
openAICompatible/liteLLM endpoint local?" with the SAME 8-line URL-parse-and-check-hostname block copied
verbatim FOUR times (FIM, OpenAI-compat chat, the SDK factory, Ollama chat). It drives local-provider
optimizations (shorter timeouts, streaming FIM, the local max-tokens budget) and -- importantly -- matches
by HOSTNAME not substring, so "localhost.evil.com" is correctly NOT local. That subtlety deserves one
tested home, not four copies a fix could miss.

Extracted isLoopbackEndpoint(endpoint) to pure node-tested common/loopbackEndpoint.ts (verbatim loopback
set localhost/127.0.0.1/0.0.0.0/::1, same URL parse + try/catch -> non-local on missing/empty/unparseable).
Each of the 4 sites collapses to `(providerName === 'openAICompatible' || providerName === 'liteLLM') &&
isLoopbackEndpoint(settingsOfProvider[providerName]?.endpoint)` -- byte-identical (false unless the guard
holds AND the endpoint is loopback). The explicit-provider list and the FIM site's hasFIMSupport use are
unchanged. Not a security boundary (egress stays gated by egressPolicy); this is a UX/responsiveness gate.
Tests pin the common local cases, case-insensitivity, the hostname-not-substring guard (localhost.evil.com
-> false, LAN -> false), and the safe non-local default.

tsgo 0; common suite 859 -> 863 (+4); hygiene clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nch module loading

Found while bringing the dev build up for live verification. Two services re-exported an interface/type
as a VALUE, which the TS compiler elides but the esbuild dev-transpile (node build/next/index.ts) cannot,
so the emitted out/ module had no such runtime export and the workbench threw at module load:

  "The requested module './editRiskScore.js' does not provide an export named 'EditContext'"

- editRiskScoringService.ts:18  `export { EditContext, EditRiskScore }`  (both are interfaces)
    -> `export type { EditContext, EditRiskScore }`
- ollamaInstallerService.ts:13  `export { MODEL_PACKS, ModelPackKey }`  (ModelPackKey is a type alias)
    -> `export { MODEL_PACKS, type ModelPackKey }`

These are dev-build-only breakages (the production gulp/tsc build elides type re-exports correctly), but
they made the cortexide UI fail to load under the esbuild dev transpile that the smoke harness uses. Swept
the whole cortexide tree for other bare re-exports; the only other ones (imageQA/index.ts) already use the
inline `type` modifier. tsgo 0; LIVE-verified: re-transpile + relaunch -> the module-load SyntaxErrors are
gone and cdp-smoke is 11/11 (was failing "no fatal console errors").

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…calling a non-existent API)

TreeSitterService.extractSymbols always returned [] -- the audit's "tree-sitter dead (wrong API)": it did
`await import('@vscode/tree-sitter-wasm')` then called `wasmModule.createParser(language)`, a method that
does not exist, and never ran Parser.init()/Language.load(). So RAG was BM25-only and the cortexide.index.ast
config did nothing.

Rewired it to delegate WASM loading to the editor's own ITreeSitterLibraryService (the same service the
terminal command parser uses) -- getParserClass() (owns Parser.init), getLanguagePromise(grammarId) (owns
Language.load + wasm-path resolution, and bypasses the editor's prefer-treesitter support gate), then
`new Parser(); parser.setLanguage(language); parser.parse(content)`. Parsers are cached per grammar (a
missing/failed grammar caches null so it is not retried per file), and Tree objects are released via
tree.delete() in a finally (they hold WASM memory; the extracted ASTSymbol[] is plain data). The existing
AST traversal (_traverseAST etc.) was already correct standard web-tree-sitter API -- it just never received
a real tree.

Added pure node-tested treeSitterGrammarId(languageId) mapping our language ids to the shipped
@vscode/tree-sitter-wasm grammar ids: identity for ts/tsx/js/python/java/go/rust/cpp/php/ruby, csharp ->
'c-sharp' (the on-disk grammar name), and c/swift/kotlin -> null (no grammar ships -> graceful BM25/LSP
fallback, no throw).

LIVE-VERIFIED via a temporary command-palette probe against the running dev build (now removed): real
symbols extracted -- TS: function:alpha, class:Beta, variable:delta; Python: function:foo, class:Bar; Go:
function:Hello + struct field; Swift (no grammar): 0 symbols, no throw. tsgo 0; common suite 863 -> 867
(+4); cdp-smoke 11/11; hygiene clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…r the tamper-evident log)

Builds the user-facing half of the audit deferral on the parseJsonl read-side landed earlier. The audit
log records every dangerous action but there was no way to inspect it. Added:
- IAuditLogService.readEvents() (flushes buffered writes, reads the file, parseJsonl -> {events, skipped},
  tolerant of a truncated trailing line) + getLogPath().
- Pure node-tested formatAuditEvents(events, skipped) -> a readable, copy-able report (ISO timestamps from
  each event's epoch-ms ts, OK/ERR status, action, and only the present optional fields; header surfaces any
  skipped corrupt lines).
- A "CortexIDE: Show Audit Log" Action2 that opens the rendered log in an editor (so it can be saved/exported);
  shows an info notice when auditing is disabled.

LIVE-VERIFIED against the running dev build (audit enabled, a seeded audit.jsonl with 3 events + 1 truncated
trailing line): the command opened an editor reading "CortexIDE Audit Log -- 3 events / (1 corrupt/truncated
line skipped)" with each event rendered (prompt+model, apply+files+diffstats+latency, rollback ERR) -- proving
readEvents -> parseJsonl -> formatAuditEvents -> editor end to end. +5 formatter tests; updated the
MockAuditLogService stub for the 2 new interface methods.

NOTE on the OTHER half of this item (wrap the global ILogService in RedactingLogService): deliberately NOT
done. ILogService is constructed in desktop.main.ts BEFORE IConfigurationService exists, but the redaction
service is config-dependent, so it cannot wrap the root logger cleanly there; and secret-bearing logs (LLM
dispatch) are already redacted at the source (sendLLMMessageService) -- matching the project's prior
high-risk/low-value assessment. There is no scoped cortexide logger to wrap instead. Left as an honest defer.

tsgo 0; common suite 867 -> 872 (+5); cdp-verified; hygiene clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…o a pure tested predicate

The safe testable slice of the aborted/erroring multi-block apply path. During a streaming SEARCH/REPLACE
edit, each new block is located in the ORIGINAL file; if it can't be located (a 'Not found'/'Not unique'
error) or its target range OVERLAPS a block already applied this stream, the WHOLE edit is reverted and the
model is re-prompted from the first block. A wrong overlap test risks silent data loss (a good edit thrown
away, or a conflicting edit applied), but the logic was inline in editCodeService and untested.

Extracted the DECISION (not the side effects) to pure node-tested common/editStreamRevertDecision.ts:
rangesOverlap(a,b) (touching endpoints count as overlap, matching the inline rule) + decideStreamRevert
({originalBoundsError, thisBlockRange, existingRanges}) -> {revert, errorMessage}. Byte-identical to the old
`if (typeof originalBounds === 'string' || hasOverlap)` check incl. the 'Has overlap' message and
error-takes-precedence ordering. The revert side effects (delete tracking zones, rewrite the file to the
original, abort the stream) stay inline. Re-narrowed originalBounds with an unreachable
`if (typeof originalBounds === 'string') return` after the revert guard (the old condition's
`typeof === 'string'` provided the narrowing the pure call drops; decideStreamRevert reverts for every
string, so the guard never fires).

Tests: overlap (disjoint/touching/contained/identical/partial/gap), locate-error precedence, 'Has overlap',
no-revert, first-block, some-semantics across existing ranges. tsgo 0; common suite 872 -> 881 (+9); hygiene
clean. (The revert branch itself needs a forced malformed multi-block stream to drive live, so it stays
unit-verified; the surrounding apply path is covered by the existing cdp atomic-edit harness.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rovider

RAG was BM25-only because ZERO embedding providers were ever registered with IAiEmbeddingVectorService
(the whole vector pipeline -- gate, hybrid rerank, vector store adapters -- was wired but inert). Added a
local Ollama embedding provider so semantic (BM25 + vector) retrieval activates when the user configures
a model.

Wiring (the renderer can't reach Ollama, so embeds route through electron-main, mirroring the LLM path):
- electron-main sendOllamaEmbed (ollama SDK .embed) -> a new request-response 'ollamaEmbed' command on
  LLMMessageChannel, egress-gated defense-in-depth (refuses a non-loopback endpoint under local-only).
- renderer LLMMessageService.ollamaEmbed (same loopback egress gate as ollamaList) returns the vectors.
- OllamaEmbeddingProviderContribution (WorkbenchPhase.AfterRestored): when cortexide.rag.embeddingModel is
  set AND dispatchable, PROBES the model (so isEnabled() never lies), then registers an
  IAiEmbeddingVectorProvider; re-syncs on config / privacy-state change; unregisters when ineligible.
- New cortexide.rag.embeddingModel setting (default '' = BM25-only; opt-in).
- Pure node-tested common/ollamaEmbeddings.ts: extractEmbeddingVectors (throws on empty/ragged/non-finite
  so cosine similarity never gets garbage) + canUseOllamaEmbeddings eligibility.

Privacy: Ollama is loopback, so chunks stay on-machine and the call is allowed under local-only (secrets
are already redacted before embedding upstream). Any failure (no model / Ollama down) leaves retrieval
gracefully on BM25.

LIVE-VERIFIED against the running dev build (cortexide.rag.embeddingModel=nomic-embed-text, model pulled):
the contribution logged "registered local embedding provider 'nomic-embed-text' -- hybrid retrieval
active"; IAiEmbeddingVectorService.isEnabled()=true; getEmbeddingVector returned a real 768-dim vector; the
electron-main round-trip returned 2x768 vectors. tsgo 0; common suite 881 -> 888 (+7); cdp-smoke 11/11;
hygiene clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…n the working notes)

PR #64 removed docs/ from main. This branch still carried the stale modernization/session docs
(MODERNIZATION-BASELINE/HANDOFF, NEXT-SESSION-PROMPT, PHASE2-WIRING-PLAN, + the comparison docs), which
a PR from this branch would have re-added. Removing them so the branch matches main and the modernization
PR stays code-only. The consolidated, current status lives in the working notes outside the repo.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Pterjudin Pterjudin merged commit 4010c5f into main Jun 15, 2026
11 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant