Modernization continuation: feature activations (tree-sitter AST, hybrid RAG) + tested extractions + reliability fixes#65
Merged
Conversation
RepoIndexerService.query()/queryWithMetrics() only ever returned formatted
citation STRINGS ("File: <path>:<lines>\nSymbols...\nContent preview..."), so
the two consumers that need the matched file -- codebaseQueryCommands (Query
Codebase quick-pick) and the agent's search_for_files indexer path -- were
doing URI.file(<whole formatted blob>), producing a bogus URI that opened/paged
nothing.
Add queryStructured(text,k): Promise<RetrievalResult[]> returning the discrete
hits (each with a real file URI). It shares one private _queryWithStructured()
core with queryWithMetrics(): a `structured` array is pushed in lockstep with
the string `results` at both assembly sites (common-query-cache + main path),
threaded through the query/common caches, and returned. query()/queryWithMetrics()
stay byte-identical (string assembly untouched; structured is purely additive).
The context-cache fallback has no per-result URI, so it returns no structured hits.
Point both consumers at r.uri (deduped by file -- a file can match via multiple
chunks/symbols). Pins a structured<->formatted consistency test (the citation
embeds the discrete uri verbatim). tsgo 0; 611 node tests pass.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ransaction
multi_edit's replace_all was DEAD for the multi-occurrence case it exists for.
The tool expanded a replace_all edit into N IDENTICAL Search/Replace blocks and
fed them to the span engine (computeSearchReplaceResult), whose findTextInCode
rejects a non-unique ORIGINAL as 'Not unique' -- so block 0 failed immediately
(and even past that, all N identical blocks resolve to the same indexOf span ->
'Has overlap'). So replace_all=true on any string occurring 2+ times applied
nothing. The model-facing prompt was also wrong: it claimed replace_all=false
"replaces only the first", but the engine actually requires the match to be unique.
Fix: a pure, node-tested common/multiEdit.ts computeMultiEditResult(content, edits)
that applies the edits as the standard multi-edit transaction:
- SEQUENTIAL: each edit operates on the text the prior edits produced.
- replace_all=false: old_string must be UNIQUE (same indexOf===lastIndexOf test
as findTextInCode, so a single edit matches edit_file's behavior); else
'Not unique' / 'Not found'.
- replace_all=true: replace every left-to-right non-overlapping occurrence (>=1).
- ALL-OR-NOTHING: validates+rewrites a local copy; the first failure returns
ok:false with no newContent, so the caller throws before any write.
toolsService.multi_edit now computes the final content via this and applies it
through the existing instantlyRewriteFile diff path (the same one rewrite_file
uses); computeSearchReplaceResult / edit_file are untouched. Prompt corrected to
document the sequential + unique-or-replace_all semantics.
Tests: 15 unit cases + a differential fuzz (20k single verbatim-substring edits +
5k independent full-line multi-edits) proving byte-identical results vs
computeSearchReplaceResult on the inputs the old engine accepted (0 mismatch), so
the swap is non-regressing while additionally fixing replace_all. tsgo 0; 628 node
tests pass.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…pic tool-name concat Four bug-hunt backlog items (re-verified in code): 1. chatLatencyAudit context leak on llmError/abort. onFinalMessage released the context (stopping the 60Hz render-monitor interval), but onError and onAbort never did -- so every errored or aborted request leaked its context AND kept the interval running forever. Release in both. 2. chatLatencyAudit fallover ORPHAN. finalRequestId is `const`, declared before the retry loop, so the retry reuses it -- but the model-fallover path started a context under a throwaway newRequestId the retry never used, orphaning a context (+ interval) that was never released. Re-arm under finalRequestId instead, which also restores latency tracking for the fallover attempt (onError released the prior attempt's context just before). 3. startBackgroundAgent hidden-thread leak. The hidden thread was added to allThreads but never removed, so every background-agent run left a thread object behind forever. Confirmed it is NOT user-inspectable (in-memory only, never in openTabs; the Running-agents panel renders the record's resultSummary, never the thread), and the summary is captured before cleanup -> drop the thread in a .finally() once the run settles (completed / errored / cancelled). 4. Anthropic streamed tool-name concat. content_block_start did `fullToolName += name` per tool_use block, so parallel tool calls concatenated into a garbage streamed name like "read_filelist_dir"; finalMessage already uses tools[0]. Keep only the first block's name. All four are in the browser agent loop / electron-main provider (excluded from the node runner), so they are verified by code reasoning, not unit tests; chatLatencyAudit releaseContext is idempotent and the lifecycle is now release-on-every-terminal-exit. tsgo 0; 628 node tests still pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…patch
Add user-invokable Skills, mirroring Claude Code's Agent Skills and the sibling
custom-agents feature. Each `.cortexide/skills/<name>/SKILL.md` (Markdown +
optional name/description frontmatter) becomes a `/<skill-name> [args]` slash
command in chat: invoking it expands the skill's instruction body (plus the
user's args) into a normal chat turn.
Pure testable core (common/cortexideSkillsService.ts, mirrors cortexideAgentsService):
- parseSkillFile(dirName, text, uri): frontmatter + instruction body, no YAML dep.
- parseSkillInvocation(input): "/name args" -> { name (lower-cased), args } | null.
- buildSkillInvocationMessage(skill, args): the expanded turn text.
- CortexideSkillsService: discovers each sub-dir of .cortexide/skills that holds a
SKILL.md (recursive FS watch, 64 skills / 64 KB caps), getSkill() by name.
Wiring: chatThreadService imports the service (registering the singleton) and
exposes getSkillExpansion(input) + listSkillNames(); the chat input's slash handler
(SidebarChat) dispatches in its default case AFTER the built-in commands (so a
built-in like /help is never shadowed by a same-named skill), and /help now lists
the available skills.
Tests: 16 pure unit cases (parse / invocation / message-build, incl. CRLF, quoted
frontmatter, multi-line args, lone-slash, dir-name default). tsgo 0; buildreact
clean; 644 node tests pass. The discovery service (FS watch) and the React slash
handler are browser-layer (not node-testable); verified by tsgo + buildreact +
the pure-core tests -- a live CDP drive needs a workspace skill file (follow-up).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The "what can leave my machine" report covered model dispatch, catalog refresh,
web tools, embeddings, vector store, MCP, and update-check -- but NOT product
telemetry, even though it's a real off-machine channel. Add it as a 7th channel.
Its status is computed via the telemetryConsent SSOT (isTelemetryEnabled), using
the SAME local-only resolution the electron-main metrics gate uses
(routingPolicy==='local-only' OR localFirstAI), so the report can't disagree with
what actually ships:
- opt-IN by default: OPT_OUT_KEY absent/'true' -> 'not-configured' (nothing sent)
- explicitly opted in ('false') and not local-only -> 'open'
- opted in but local-only -> 'blocked' (forced off, with a reason)
The privacy-report command reads OPT_OUT_KEY from IStorageService at APPLICATION
scope (shared with main) and threads it into the report config. Added 'telemetry'
to EgressModality (+ a canEgress case to keep the switch exhaustive/total).
Tests: a dedicated telemetry-channel test (default off / opt-in open / local-only
forced off via both routingPolicy and localFirstAI) + updated the minimal-report
count. tsgo 0; 645 node tests pass.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
mcpChannel._addUniquePrefix prepended `Math.random().toString(36)` to every MCP tool name -- so each tool got a DIFFERENT prefix and all of them changed on every reconnect/reload, meaning the model never saw a stable tool name across sessions (and tools from one server shared no common prefix). Replace it with a pure, deterministic, server-keyed prefix (new common/ mcpServiceTypes.mcpToolNamePrefix, FNV-1a -> 6 chars of [0-9a-z]). All tools from one server now share one stable prefix. Safe because routing uses the separate serverName (not the prefix) and the prefix is '_'-free, so the call-time strip removeMCPToolNamePrefix still recovers the original tool name. Threaded serverName through the 5 call sites. Tests: determinism, distinctness across server names, 6-char base36 / no '_', and round-trip through removeMCPToolNamePrefix for tool names containing underscores. tsgo 0; 649 node tests pass. (mcpChannel is electron-main / not in the node runner; the prefix logic is the pure tested fn.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…o tested code
CortexIDE's positioning makes concrete promises. This test ties each marketing-level
claim to the pure module that actually enforces it, so a regression that would make a
claim FALSE fails a test named after the claim (traceability layer over the detailed
module tests):
- "local-first / private" -> buildEgressReport: local-only opens 0 off-machine channels
- "telemetry is opt-IN" -> isTelemetryEnabled: off by default, local-only forces off
- "never leaks a secret" -> detectSecrets: flags + redacts an API key
- "no SSRF to internal/metadata" -> canEgress+classifyDestination: loopback ok, metadata/private/
hex-IPv4-mapped-IPv6 blocked
- "dangerous actions are gated" -> classifyCommandRisk: `rm -rf /` requiresApproval
- "resistant to prompt injection"-> wrapUntrustedContent: fences + neutralizes a forged end-marker
- "model-agnostic failover" -> buildFailoverCandidates: failed local model -> configured cloud model
tsgo 0; 656 node tests pass.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e does NOT govern The privacy report listed CortexIDE's own channels but said nothing about egress inherited from the VS Code platform -- which risks implying local-only blocks literally everything. Add a PLATFORM_INHERITED_EGRESS notes section (the Phase 8 egress-leak audit's secondary findings) rendered under a clear heading: - webview UI assets from the VS Code CDN (vscode-cdn.net) - the built-in GitHub Copilot chat agent's own endpoints (if enabled) - the on-demand "curl | sh" local-model installer (only on explicit click) These are out of CortexIDE's routing control (or fire only on user action); naming them keeps the report honest rather than over-claiming. Added platformNotes to the report + a test. tsgo 0; node tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…mmon module
The decision that gates whether a tool (edit/terminal/MCP) runs WITHOUT user
confirmation -- the core "never silently does something destructive" boundary --
was an inline mutated `shouldAutoApprove` boolean in chatThreadService with ZERO
tests. Extract it to pure common/autoApprovePolicy.ts (computeAutoApproveBaseline +
decideAutoApprove) preserving the EXACT override order: catastrophic-command
hard-block -> setting baseline ('edits' defaults true) -> dangerous-command force
approval -> cwd-escape force approval -> YOLO NL-safe approve -> HIGH-risk-edit
force approval -> YOLO edit approve.
The caller computes the scalar inputs (classifyCommandRisk, the NL heuristic,
scoreEdit) then makes ONE decision; telemetry / the hard-block message / the
auto-apply notification stay inline, gated on the decision's per-rule flags so they
fire under identical conditions with identical payloads. The NL+edit telemetry keeps
its original try/catch swallow; the terminal telemetry stays un-wrapped (parity).
Tested: golden table over every override + a 30k differential fuzz that re-implements
the OLD inline mutation order independently and asserts the extracted decision matches
on every input combination (0 mismatch). A 3-lens adversarial verification workflow
(telemetry / control-flow / computation-order) confirmed full side-effect + control-flow
parity; the one exception-propagation delta it found is now fixed. tsgo 0; 672 tests.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tested core The edit risk/confidence score gates auto-apply (HIGH can never be silently auto-approved; YOLO keys its silent-apply threshold off the 0.2/0.7 boundaries), but EditRiskScoringService.scoreEdit was untested. Extract everything except factor #6 (the count of pre-existing Error markers, which needs the live model/markers) into pure common/editRiskScore.ts scoreEditFromContext(context, existingErrorCount). The service now computes only that count -- passing 0 when the model is unavailable or there's no newContent, mirroring the old `if (model && newContent)` guard -- and delegates. EditContext/EditRiskScore types moved to the pure module and re-exported from the service so existing import paths are unchanged. Byte-identical logic. Tested: factor-by-factor golden cases (deletion=HIGH+1.0, critical +0.5, >50% rewrite crosses HIGH, test-file +0.2, multi-file cap, >5-errors +0.2, create floor 0.05, tiny edit very-low) + the LOW/MEDIUM/HIGH classifier boundaries incl. the silent-auto-apply boundary (riskScore<0.2 AND confidence>0.7). tsgo 0; 685 tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… fuzz) findDiffs is the line-diff engine every apply/accept/reject hunk derives from, and it had no test. It is node-importable as-is (only the pure diffLines bundle + a type). Add golden cases pinning each hunk type + the internal trailing-\n bookkeeping (insertion / deletion / edit / identical / "E vs E\n is an insertion" / empty old+new), plus a 15k reconstruction property fuzz: re-applying every returned hunk to the old text must rebuild the new text EXACTLY. The reconstruction oracle is sanity-checked against the goldens before being used. Test-only; tsgo 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…re, tested)
editCodeService.acceptDiff / rejectDiff carried the off-by-one-prone line/range math
inline + untested. Extract it to pure common/perHunkAccept.ts:
- computeAcceptedOriginalCode(originalCode, diff): folds an accepted hunk into the
diff-area baseline (deletion/insertion/edit splices), byte-identical to acceptDiff.
- computeRejectWrite(diff, diffAreaEndLine) -> {writeText, toRange}: the write+range
that undoes a hunk, incl. the two end-of-zone rounding cases (deletion past the zone
end, insertion of the final newline). Range is a plain IRange-shaped object (no editor
import); the service passes it straight to _writeURIText.
Tested: golden splices + golden range math for every variant and both end-of-zone
boundaries, PLUS a 12k accept-convergence property fuzz -- repeatedly accepting the
first hunk and re-diffing (exactly as the service does) must fold originalCode all the
way to the new code. That sequential-accept path is where the boundary off-by-ones
live. tsgo 0; 704 tests.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ead corpus The streaming SEARCH/REPLACE parser (extractSearchReplaceBlocks) already had golden + a single fixed streaming-monotonicity test. Add a randomized streaming fuzz: 2k random multi-block SR strings fed prefix-by-prefix must never regress (block count non-decreasing, per-block state never goes done->writingFinal->writingOriginal), and the complete stream parses to exactly N done blocks. The fuzz surfaced a real parser CONTRACT: a block with an EMPTY ORIGINAL is mis-parsed (the `=======` on the line right after `<<<<<<< ORIGINAL` is swallowed as content), while an empty UPDATED parses fine -- pinned as an explicit test so callers know ORIGINAL must be non-empty (realistic: you never search for nothing). Also deleted the ~200-line commented-out test corpus from the source (it lives in the runnable suite now). tsgo 0; 706 tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…hes logs" path) RedactingLogService held the redact-message/redact-args logic inline, reachable only through its injected ISecretDetectionService (not node-constructible), so the secret- in-logs guarantee was effectively untested at the log layer. Extract it to pure common/logRedaction.ts (redactLogMessage + redactLogArgs over a SecretDetectionConfig, built on the already-pure detectSecrets/redactSecretsInObject). The service now delegates to these -- byte-identical, since its injected service's detectSecrets/ redactSecretsInObject are thin wrappers over the same free functions + getConfig(). Tested: redacts API keys out of message + string args + deeply-nested object args; clean lines pass through; non-string/object args untouched; disabled config is a pass-through. (The production ILogService DI swap remains the deferred cdp-only item.) tsgo 0; 713 tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…edToolProperties toAnthropicTool and toOpenAICompatibleTool built their `paramsWithType` map with the identical inline loop, and toGeminiFunctionDecl built an equivalent one by hand -- the "every property gets a JSON-Schema type" contract was duplicated 3x (and a type-less variant once shipped, fixed in ff1718a). Extract the one pure buildTypedToolProperties(params) into common/providerToolFormat.ts; OpenAI + Anthropic use it directly, Gemini maps its typed properties to the SDK's Type.STRING at the electron-main boundary. Byte-identical output; tsgo confirms the `satisfies Anthropic.Messages.Tool / FunctionDeclaration` assertions still hold. Tested: empty/single/multi params, every property typed (the regression guard), input not mutated, and the OpenAI tool embeds the typed properties. tsgo 0; 719 tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ol-capture reducers
The OpenAI non-streaming response handler and the Gemini streaming chunk loop captured
the model's text + tool call with inline logic in electron-main (untestable in the node
runner). Extract the pure cores to common/providerToolFormat.ts:
- extractToolCallFromNonStreamingChoice(choice): {empty, hasToolCall, text, name,
args, id}. The caller keeps the original `if (toolCalls.length>0)` guard via
hasToolCall, so a prior streaming attempt's tool vars aren't clobbered.
- reduceGeminiChunk(state, chunk): text appends, a functionCall REPLACES (last wins,
unlike OpenAI's concatenation) -- byte-identical to the inline loop.
- finalizeGeminiToolId(toolId, uuidGen): the empty-id -> generated-id fallback.
Tested: OpenAI missing-choice/empty, content-no-tools, one tool_call, nullish-coerce,
first-only; Gemini text-append, functionCall-capture, last-wins replacement, undefined
args -> "{}", no-op chunk; and the id fallback. tsgo 0; 729 tests.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
….ok(true)) rollbackSnapshotService.test.ts was 4 vacuous assert.ok(true) placeholders. Extract the snapshot byte budget into pure common/snapshotBudget.ts (snapshotFileBytes + planSnapshot greedy include-until-overage) and have the service read-then-plan. The included set + skipped flag are identical to the old streaming loop (only a past-budget file may now be read before exclusion -- harmless for a pre-edit snapshot). Replace the placeholders with real golden tests: all-fit, over-budget truncation at the boundary, exactly-at-budget (strict > ), empty, single-oversized, and greedy "skip stops scanning". The model/file reads stay in the service (not node-testable). tsgo 0; 732 tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…(true))
autostash.flow.test.ts was 4 vacuous placeholders. restoreStash + dropStash both
parsed `stash@{N}` inline identically; extract that to pure common/gitStashRef.ts
parseStashIndex (defaults to 0 / latest on a missing or malformed ref, byte-identical),
have both sites delegate, and replace the placeholders with real tests: well-formed
indices, malformed/empty -> 0, embedded ref. The stash create/restore/drop flows need
the live git command service (not node-testable). The placeholder's "dirty-only mode
skips stash" case is dropped -- createStash has no such mode (would have asserted dead
behavior). The inferred isBenignStashFailure helper does not exist in the code, so it
was not invented. tsgo 0; 735 tests.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…der test
applyAll.rollback.flow.test.ts was 3 assert.ok(true) placeholders. Its meaningful
scenarios are ALREADY covered by REAL tests in applyEngineV2.test.ts with a concrete
MockRollbackSnapshotService: "on apply failure, snapshot restore is called" by the
atomicity test (:237, asserts createdCount/restoredIds/no-discard) and "success path
discards snapshot" by the snapshot-lifecycle test (:295). The third placeholder
("snapshot skipped -> git restore invoked") asserted nothing, so deleting loses no
coverage. Removing fake tests (false confidence) serves the no-fake-safety goal.
Follow-up: a REAL skipped-snapshot -> git-stash-restore test in applyEngineV2.test.ts
(needs the mock's createSnapshot to return skipped=true) is a genuine remaining gap.
Measured full suite: 728 passing, 0 failing; tsgo 0.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ert.ok(true)) auditLog.append.p0.test.ts was 3 vacuous placeholders. Extract the audit log's on-disk format + rotation policy into pure common/auditLogFormat.ts and have AuditLogService delegate (byte-identical): - serializeEvents(events): JSONL -- one compact JSON object per line + trailing \n - shouldRotate(currentSize, addBytes, rotationSizeMB): strict-> MB threshold - rotatedLogPath(jsonlPath, n, compressed): audit.jsonl -> audit.<n>.jsonl[.gz] The append/flush/file I/O stays in the service (not node-testable). Replace the placeholders with real tests: JSONL shape + round-trip, the strict-> rotation boundary, and the rotated-name format (only trailing .jsonl rewritten). Completes the placeholder-test cleanup (snapshot/gitStash/auditLog now real; applyAll.rollback deleted). Measured full suite: 732 passing, 0 failing; tsgo 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…alation deciders checkEarlyTokenQuality + shouldUseSpeculativeEscalation were already pure but untested; they gate abandoning a streaming response mid-flight for a stronger model. Add a test file (no source change): <20 tokens -> neutral 0.5/no-escalate, repetition penalty, generic-refusal+error -> score < 0.5 AND >=50 tokens -> escalate, the escalate requires BOTH conditions, incomplete code fence penalty, a clean balanced-fence response stays 1.0; and the speculative truth table (confidence 0.59 vs 0.6 boundary, qualityTier 'escalate' forces it). tsgo 0; measured full suite 741 passing, 0 failing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Extract RepoIndexerService._partialSort (the O(n log k) top-k-by-score min-heap behind the BM25 rerank) verbatim to pure common/partialSort.ts; the service delegates. Add a differential fuzz: the SET of scores it returns must equal a full sort's top-k (it tolerates 0.1 ties, so compare the score multiset, not order) over 20k random arrays with k spanning 0..n+3. The fuzz surfaced a REAL latent crash: with k === 0 and a non-empty input the heap stayed empty and `heap[0].score` threw. k === 0 is reachable (the rerank pool size Math.min(k*3, ...) is 0 when a caller passes k=0). Fixed: k <= 0 -> [] (byte-identical for the k > 0 values used in practice). Measured full suite: 745 passing, 0 failing; tsgo 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e RAG ranking) BM25/keyword retrieval is the only live RAG path (embeddings are gated/optional), so its scoring is what ranks the code context the model sees -- and it was untested. Extract tokenize / scoreEntry / naiveScore verbatim from RepoIndexerService _tokenize / _scoreEntryFast / _score into pure common/bm25Score.ts; the service keeps its tokenization LRU cache and delegates the math. ScorableEntry is the structural subset of IndexEntry the scorer reads. Tested: tokenize (lowercases, splits on non-[a-z0-9_], underscores stay in-token), the relevance ranking exact-symbol(10) > partial(4) > token-only(2) > no-match(0), case-insensitive symbol match, URI binary +3, snippet-overlap cap at 5, snippet phrase. The tests pinned two real (byte-identical) quirks: tokenize does NOT split underscores, and naiveScore does not lowercase (uppercase chars act as separators). Measured full suite: 755 passing, 0 failing; tsgo 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The ls_dir tree renderer the agent reads to navigate the workspace was pure but untested. Add a test (no source change): the no-children -> "is not a directory" Error branch; first-page header + one entry per line with the directory trailing-slash and "(symbolic link)" markers; the header omitted on a non-first page; and hasNextPage appending a "(N results remaining...)" elbow line while the last shown entry keeps the tee prefix. Box-drawing prefixes are asserted via \u escapes so the source stays ASCII. Measured full suite: 759 passing, 0 failing; tsgo 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…timator estimateQualityTier biases model selection (cheap_fast / standard / escalate) before any capability scoring, so its boundaries are a routing contract -- and it was inline + untested in the 1000-line ModelRouter. Move it (and the QualityTier type, which has no external importers) to pure common/routing/qualityTier.ts over a structural QualityTierContext; ModelRouter re-exports the type and calls the pure fn (the private method is removed, its single caller updated). Byte-identical. Tested (golden table): simple+no-media -> cheap_fast; images/PDFs demote a simple question to standard; complex-reasoning/multi-step/security/>100k-context -> escalate; the 100k context-size boundary is strict (100000 == standard, 100001 == escalate); ordinary -> standard. Measured full suite: 765 passing, 0 failing; tsgo 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Both hybrid rerank variants (_rerankHybrid with computed cosine similarity, _rerankHybridWithVectorStore with vector-store lookups) shared the exact same normalize+blend math; only the per-item vector-score SOURCE differed. Extract one pure common/hybridRerank.blendScores(items, vectorScoreOf, weights); both callers pass their vector-score closure. This decouples the fragile, object-identity docId derivation (kept in the caller, documented: a chunk copy would yield indexOf=-1 and silently drop the vector signal -- not triggered today since the chunk reference is preserved) from the pure blend. Byte-identical, including the score clamp. Tested: min-max normalization clamped to include 0/1, weighted blend, a vector score lifts a low-BM25 item, missing-vector -> BM25-only, the documented dead 0.5 fallback (all-equal positive scores normalize to 1.0 not 0.5 due to the 0-floor clamp), no input mutation. Measured full suite: 772 passing, 0 failing; tsgo 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…memory storage stub The routing evaluation loop (win-rate / escalation-rate / per-model success-rate that feeds learned routing) was untested. Add a test that runs the REAL service against a 10-line Map-backed IStorageService stub (no source change): getModelSuccessRate returns the neutral 0.5 with no data and successes/total otherwise (keyed provider:model); getQualityReport win/escalation/retry rates + avgLatency over the recent window; modelPerformance map keyed provider:model with per-model count/successRate; and the last-100 window (150 recorded -> only the last 100 count). Measured full suite: 778 passing, 0 failing; tsgo 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bring Skills to parity with custom agents and harden the parser (all additive; dispatch
unchanged):
- parseSkillFile now reads allowed-tools / allowed_tools / tools into Skill.allowedTools
(same tokenizer as parseCustomAgentFile; undefined when absent/empty). Enforcement at
dispatch is a separate, not-yet-wired step.
- parseSkillInvocation name is tightened to a single [A-Za-z0-9_-] run, so a file path
(`/src/foo.ts`), a doubled slash (`//x`), or `/a.b` is no longer mistaken for a skill.
- RESERVED_SLASH_COMMANDS + isReservedSkillName (skills must not shadow built-ins).
- dedupeSkillsByName (first-wins, case-insensitive, reports conflicts), wired into the
loader so the skill set is unambiguous (matches getSkill's first-match lookup).
Tested: allowed-tools/allowed_tools/tools parsing + undefined cases; the tightened
invocation (rejects paths/doubled-slash, accepts hyphen/underscore/digit names);
reserved-name detection; first-wins dedupe with conflict reporting. Measured full suite:
784 passing, 0 failing; tsgo 0.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…matcher Add contract tests to searchReplaceMatch (test-only): callers always feed getValue(LF), so these document what would happen if CRLF ever reached the matcher -- a single-line LF needle is still found in CRLF source, a MULTI-line LF needle misses the exact match (the CR breaks it), and only the whitespace fallback recovers it by stripping CR. Plus the sequential-compose contract that the multi_edit transaction relies on: applying B to the output of A resolves text A produced, and an ORIGINAL that A CONSUMED returns Not found (no stale edit against the pre-edit content). Measured full suite: 789 passing, 0 failing; tsgo 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ntract)
Whether the indexer may compute embeddings is a privacy decision: embeddings go to an
opaque, unclassifiable provider, so under local-only mode they must be blocked (BM25
fallback). Extract RepoIndexerService._canComputeEmbeddings's pure decision to
common/embeddingsGate.canUseEmbeddings({hasEnabledProvider, routingPolicy, isOffline});
the service reads the three live inputs and delegates. Byte-identical (canEgress import
moved out of the browser service since it is now used only by the pure gate).
Tested (fail-closed truth table): no provider -> false; local-only -> false even with a
provider + online; offline -> false; all-clear -> true. Measured full suite: 793 passing,
0 failing; tsgo 0.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nd building block) Add the raw-IP classifiers needed for an SSRF DNS-rebind preflight: resolve a hostname to its IP, then classify THAT IP so a name pointing at loopback/private/cloud-metadata is caught even though the hostname looked public. classifyResolvedAddress(ip) reuses the exact IPv4 / IPv4-mapped-IPv6 (dotted AND hex-canonicalized) / IPv6 rules; a bare hostname is 'unknown'. isPrivateResolvedIP(ip) = loopback || private (what an SSRF guard blocks). classifyDestination's IP tail now delegates to it (DRY, byte-identical -- the 32 existing egressPolicy/SSRF-parity tests still pass). Tested: loopback/private/metadata/link-local/ULA incl. the hex IPv4-mapped form (::ffff:7f00:1, ::ffff:a9fe:a9fe), public -> remote, bare hostname/empty -> unknown, and the isPrivateResolvedIP block set. (The async dns.lookup + per-redirect-hop wiring on the web-tool fetch stays the deferred cdp-only item.) Measured full suite: 797 passing, 0 failing; tsgo 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…to the tested detector The auto-mode failover router rebuilt isCodebaseQuestion with a hand-written regex that had DRIFTED to a narrow subset of the tested looksLikeCodebaseQuestion (already used for the INITIAL routing + imported here) -- so the same message could be classified one way initially and another on failover, steering the failover model differently. Replace the inline regex with looksLikeCodebaseQuestion(content) so both paths agree (it is a strict superset: its first two patterns ARE the inline ones), and drop the now-unused lowerMessage. Added a PARITY test pinning that every phrasing the old inline regex matched is still detected. Measured full suite: 798 passing, 0 failing; tsgo 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…l (+ differential fuzz vs the original) #12b. The ~680-line capability-scoring arithmetic that decides which model Auto picks lived in the private ModelRouter.scoreModel and was untestable. Extracted it VERBATIM into a pure, node-tested common/routing/computeModelScore.ts; ModelRouter.scoreModel is now a thin wrapper that resolves the impure inputs (getCachedCapabilities, realParamSize from settingsState, this.isVisionCapable, the freeTierQuotaService lookup, the two globalSettings reads) and delegates. Proven byte-identical to git HEAD: a whitespace-normalized diff of the extracted scoring body against HEAD's scoreModel yields ZERO content differences modulo the 5 documented input substitutions (the injected capabilities/realParamSize/isVisionCapable/getFreeTierRemaining + routingPolicy/localFirstAI scalars, and 3 em-dash->'--' hygiene edits). A 4-lens adversarial-verify Workflow independently re-derived this (all lenses: byte-identical; blast-radius clean; oracle is a true independent copy). Validation: 50k differential fuzz vs a GENERATED oracle (test/common/computeModelScore.oracle.ts = HEAD's scoreModel body, script-extracted with the same substitutions -> catches any future drift of the extracted fn from the original) + 11 hand-traced goldens pinning each scoring axis (quality tier, privacy, code/codebase-question, local-first heavy/light, low-latency, vision, free-tier exhaustion) + invariants (auto->0, score>=0, quota-throw swallowed). Removed now-unused imports from modelRouter. tsgo 0; common suite 798 -> 814 (+16); hygiene clean (0 non-ASCII on added src lines). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…scalations now reset nMessagesSent too #21. A successful mid-task model escalation (tryEscalateModel) hands the SAME task to a fresh, more capable model, which must start with a clean per-attempt budget. The iteration-cap escalation site (chatThreadService ~3552) reset BOTH nMessagesSent and consecutiveToolErrors, but the two tool-error escalation sites (~4747 unparseable, ~4865 failed-tool-calls) reset only consecutiveToolErrors -- so a tool-error-escalated model silently INHERITED a spent iteration budget and could hit the iteration cap and stop before it had a fair chance to finish (e.g. a weak local model that burned 25/30 steps then failed tool calls would hand the strong cloud model only 5 steps). The total work stays bounded by the unchanged global escalationCount cap (MAX_MODEL_ESCALATIONS), exactly as the iter-cap site already accepted. Fix: centralize the reset in pure common/agentLoopDecisions.ts computePostEscalationCounters(triggerSite) (section 2, escalation owner) -> {nMessagesSent:0, consecutiveToolErrors:0} uniformly; route all three reset sites through it. Golden table pins the contract that every trigger site resets BOTH counters and that the two sites agree (regression guard against the old non-uniformity). The global escalation budget is intentionally NOT reset (it bounds total cross-model work). The llmError escalation path (resets its own nAttempts) is a different loop and out of scope. tsgo 0; common suite 814 -> 817 (+3). The chatThreadService wiring is browser-layer (not node-testable); the reset logic + its uniformity are node-tested in the pure fn, and the wiring is type-checked. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… log (+ truncation tolerance)
#20b. The write side (serializeEvents/shouldRotate/rotatedLogPath) already lives in pure
common/auditLogFormat.ts; the inverse was missing. Added parseJsonl(content) -> { events, skipped }:
one JSON object per line, blank/whitespace-only lines ignored (serializeEvents always trails a newline),
and a non-blank line that fails JSON.parse is SKIPPED + counted rather than thrown. The audit log is
append-only and can be cut mid-write by a crash, so a single truncated trailing line must NOT lose the
whole tamper-evident record -- every well-formed line before the corruption is still recovered.
This is the read-side building block the deferred audit-view/export will consume (the file read itself
stays in AuditLogService); it also pins the serialize<->parse round-trip bidirectionally. Tests: golden
round-trips, a truncated trailing line (skipped=1, prior events survive), a corrupt MIDDLE line (later
valid lines still parse), blank/whitespace tolerance, empty content. The pre-existing round-trip test now
delegates to parseJsonl instead of a hand-rolled split/JSON.parse.
tsgo 0; common suite 817 -> 822 (+5); hygiene clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…+ pin it
#25b. TreeSitterService._getLanguageFromUri held an inline extension->tree-sitter-language map and was
browser-only/untestable. Extracted it VERBATIM to pure node-tested common/treeSitterLanguageMap.ts
(TREE_SITTER_LANGUAGE_BY_EXTENSION + languageIdFromPath(path)); the private method now delegates. Tree-
sitter activation is still deferred (the parser load is a separate item), but this map decides which
grammar a file routes to once it is wired, so pinning it keeps that routing stable.
Byte-identical: same lower-cased trailing-segment logic (`path.split('.').pop()?.toLowerCase()`), same
15 entries, same `map[ext] || null` fallback. Golden table covers every mapped extension + a no-drift
assertion on the exported map size; plus case-insensitivity, multi-dot paths (last segment), unknown/
missing/trailing-dot -> null, and the dotfile case (/.gitignore -> null).
tsgo 0; common suite 822 -> 828 (+6); hygiene clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e tested isAutoModelSelection Wave 4 leftover. chatThreadService computed "is this the Auto selection?" by hand at two sites: ~5621 `userModelSelection?.providerName === 'auto' && userModelSelection?.modelName === 'auto'` and ~3904 the broader `!modelSelection || (modelSelection.providerName === 'auto' && ...modelName === 'auto') || (Chat-feature selection is auto)`. The exact `=== 'auto' && === 'auto'` predicate already exists as the exported isAutoModelSelection(selection: ModelSelection | null) in cortexideSettingsTypes, but it was untested and not used here. Routed both sites through isAutoModelSelection (byte-identical: the helper IS `sel?.providerName === 'auto' && sel?.modelName === 'auto'`, so site A == isAutoModelSelection(userModelSelection), and site B keeps its site-specific `!modelSelection ||` + Chat-feature OR, each auto-term now a helper call; the two synchronous reads of the immutable settings snapshot collapse to one without changing the value). Added a test suite pinning isAutoModelSelection (literal auto, null, concrete, half-auto demands both fields, complementary with isValidProviderModelSelection) so the now-shared contract can't drift. tsgo 0 (confirms both call sites are ModelSelection|null); common suite 828 -> 833 (+5); hygiene clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…Grammar (+ fragment-stream tests) #6 (the riskiest extraction -- a streaming parser with cross-chunk buffering). extractXMLToolsWrapper turns a model's streamed XML ("<read_file><uri>/a</uri></read_file>") into a structured tool call while hiding the markup from the user-visible text. Models without native function-calling (every local model in practice) rely on it, so it is load-bearing for agentic use -- and it lived in electron-main, untested. Extracted the state machine + its 4 private helpers (findPartiallyWrittenToolTagAtEnd, findIndexOfAny, parseXMLPrefixToToolCall, trimBeforeAndAfterNewLines, ToolOfToolName) VERBATIM into pure node-tested common/xmlToolCallScanner.ts as createXmlToolCallScanner(tools, toolId) -> { push(accumulatedFullText), trimAndGetFinal() }. extractXMLToolsWrapper is now a thin adapter keeping the onText/onFinalMessage plumbing + tool-set resolution (availableTools) and delegating; toolId is injected (generateUuid) so the scanner is deterministic. Removed the now-unused imports (SurroundingsRemover/RawToolCallObj/ RawToolParamsObj/ToolName/ToolParamName); extractReasoningWrapper is untouched. Behavior-preserving: the 4 helpers + push body are byte-identical to git HEAD (the only deltas are the param rename params.fullText->accumulatedFullText, the trailing onText({...}) replaced by a return the adapter re-adds, and two dropped commented-out console.logs). A 4-lens adversarial-verify Workflow confirmed byte-identical via THREE independent differential fuzzes that drove HEAD's verbatim wrapper vs the compiled scanner+adapter over random inputs x random chunkings (60k + 23k + 6.7k cases, 0 mismatches), including the first-tag-by-tool-order latching quirk and the onFinalMessage double-call timing. The one real finding it surfaced (a test-soundness gap: only finalized displayText was asserted, never an intermediate push() return -- which is what onText forwards live) is fixed by a mid-stream displayText test that catches mid-stream markup-hiding regressions (e.g. dropping the partial-tag buffer). Tests: complete calls, fragment splits, char-by-char convergence, monotonic growth, 9 malformed inputs (never throw, one-shot + fragmented), JSON-blob passthrough, unknown-tag passthrough, first-tag latching. tsgo 0; common suite 833 -> 855 (+22); hygiene clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…n the budget table Wave 5 (a testable portion of the provider send-path). sendLLMMessage.impl.ts decided the per-call output-token budget (OpenAI max_tokens / Ollama num_predict) inline and untested, yet it is a real responsiveness lever: local models are slow per token, so autocomplete asks for a tiny budget (96, fast suggestions), quick edits (Ctrl+K / Apply) a medium one (200), and cloud calls a flat 300. Used by the FIM, Ollama-FIM, and Ollama-chat paths. Extracted computeMaxTokensForLocalProvider VERBATIM to pure node-tested common/localProviderMaxTokens.ts; the 3 call sites now import it. The surrounding SDK/streaming/abort plumbing stays in electron-main (only CDP-testable). Golden table pins every branch: cloud->300 regardless of feature, local Autocomplete->96, local Ctrl+K/Apply->200, local Chat/SCM/unknown/undefined->300. tsgo 0; common suite 855 -> 859 (+4); hygiene clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ted isLoopbackEndpoint Wave 5 (a testable portion of the provider send-path). sendLLMMessage.impl.ts decided "is this openAICompatible/liteLLM endpoint local?" with the SAME 8-line URL-parse-and-check-hostname block copied verbatim FOUR times (FIM, OpenAI-compat chat, the SDK factory, Ollama chat). It drives local-provider optimizations (shorter timeouts, streaming FIM, the local max-tokens budget) and -- importantly -- matches by HOSTNAME not substring, so "localhost.evil.com" is correctly NOT local. That subtlety deserves one tested home, not four copies a fix could miss. Extracted isLoopbackEndpoint(endpoint) to pure node-tested common/loopbackEndpoint.ts (verbatim loopback set localhost/127.0.0.1/0.0.0.0/::1, same URL parse + try/catch -> non-local on missing/empty/unparseable). Each of the 4 sites collapses to `(providerName === 'openAICompatible' || providerName === 'liteLLM') && isLoopbackEndpoint(settingsOfProvider[providerName]?.endpoint)` -- byte-identical (false unless the guard holds AND the endpoint is loopback). The explicit-provider list and the FIM site's hasFIMSupport use are unchanged. Not a security boundary (egress stays gated by egressPolicy); this is a UX/responsiveness gate. Tests pin the common local cases, case-insensitivity, the hostname-not-substring guard (localhost.evil.com -> false, LAN -> false), and the safe non-local default. tsgo 0; common suite 859 -> 863 (+4); hygiene clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nch module loading
Found while bringing the dev build up for live verification. Two services re-exported an interface/type
as a VALUE, which the TS compiler elides but the esbuild dev-transpile (node build/next/index.ts) cannot,
so the emitted out/ module had no such runtime export and the workbench threw at module load:
"The requested module './editRiskScore.js' does not provide an export named 'EditContext'"
- editRiskScoringService.ts:18 `export { EditContext, EditRiskScore }` (both are interfaces)
-> `export type { EditContext, EditRiskScore }`
- ollamaInstallerService.ts:13 `export { MODEL_PACKS, ModelPackKey }` (ModelPackKey is a type alias)
-> `export { MODEL_PACKS, type ModelPackKey }`
These are dev-build-only breakages (the production gulp/tsc build elides type re-exports correctly), but
they made the cortexide UI fail to load under the esbuild dev transpile that the smoke harness uses. Swept
the whole cortexide tree for other bare re-exports; the only other ones (imageQA/index.ts) already use the
inline `type` modifier. tsgo 0; LIVE-verified: re-transpile + relaunch -> the module-load SyntaxErrors are
gone and cdp-smoke is 11/11 (was failing "no fatal console errors").
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…calling a non-existent API)
TreeSitterService.extractSymbols always returned [] -- the audit's "tree-sitter dead (wrong API)": it did
`await import('@vscode/tree-sitter-wasm')` then called `wasmModule.createParser(language)`, a method that
does not exist, and never ran Parser.init()/Language.load(). So RAG was BM25-only and the cortexide.index.ast
config did nothing.
Rewired it to delegate WASM loading to the editor's own ITreeSitterLibraryService (the same service the
terminal command parser uses) -- getParserClass() (owns Parser.init), getLanguagePromise(grammarId) (owns
Language.load + wasm-path resolution, and bypasses the editor's prefer-treesitter support gate), then
`new Parser(); parser.setLanguage(language); parser.parse(content)`. Parsers are cached per grammar (a
missing/failed grammar caches null so it is not retried per file), and Tree objects are released via
tree.delete() in a finally (they hold WASM memory; the extracted ASTSymbol[] is plain data). The existing
AST traversal (_traverseAST etc.) was already correct standard web-tree-sitter API -- it just never received
a real tree.
Added pure node-tested treeSitterGrammarId(languageId) mapping our language ids to the shipped
@vscode/tree-sitter-wasm grammar ids: identity for ts/tsx/js/python/java/go/rust/cpp/php/ruby, csharp ->
'c-sharp' (the on-disk grammar name), and c/swift/kotlin -> null (no grammar ships -> graceful BM25/LSP
fallback, no throw).
LIVE-VERIFIED via a temporary command-palette probe against the running dev build (now removed): real
symbols extracted -- TS: function:alpha, class:Beta, variable:delta; Python: function:foo, class:Bar; Go:
function:Hello + struct field; Swift (no grammar): 0 symbols, no throw. tsgo 0; common suite 863 -> 867
(+4); cdp-smoke 11/11; hygiene clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…r the tamper-evident log)
Builds the user-facing half of the audit deferral on the parseJsonl read-side landed earlier. The audit
log records every dangerous action but there was no way to inspect it. Added:
- IAuditLogService.readEvents() (flushes buffered writes, reads the file, parseJsonl -> {events, skipped},
tolerant of a truncated trailing line) + getLogPath().
- Pure node-tested formatAuditEvents(events, skipped) -> a readable, copy-able report (ISO timestamps from
each event's epoch-ms ts, OK/ERR status, action, and only the present optional fields; header surfaces any
skipped corrupt lines).
- A "CortexIDE: Show Audit Log" Action2 that opens the rendered log in an editor (so it can be saved/exported);
shows an info notice when auditing is disabled.
LIVE-VERIFIED against the running dev build (audit enabled, a seeded audit.jsonl with 3 events + 1 truncated
trailing line): the command opened an editor reading "CortexIDE Audit Log -- 3 events / (1 corrupt/truncated
line skipped)" with each event rendered (prompt+model, apply+files+diffstats+latency, rollback ERR) -- proving
readEvents -> parseJsonl -> formatAuditEvents -> editor end to end. +5 formatter tests; updated the
MockAuditLogService stub for the 2 new interface methods.
NOTE on the OTHER half of this item (wrap the global ILogService in RedactingLogService): deliberately NOT
done. ILogService is constructed in desktop.main.ts BEFORE IConfigurationService exists, but the redaction
service is config-dependent, so it cannot wrap the root logger cleanly there; and secret-bearing logs (LLM
dispatch) are already redacted at the source (sendLLMMessageService) -- matching the project's prior
high-risk/low-value assessment. There is no scoped cortexide logger to wrap instead. Left as an honest defer.
tsgo 0; common suite 867 -> 872 (+5); cdp-verified; hygiene clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…o a pure tested predicate
The safe testable slice of the aborted/erroring multi-block apply path. During a streaming SEARCH/REPLACE
edit, each new block is located in the ORIGINAL file; if it can't be located (a 'Not found'/'Not unique'
error) or its target range OVERLAPS a block already applied this stream, the WHOLE edit is reverted and the
model is re-prompted from the first block. A wrong overlap test risks silent data loss (a good edit thrown
away, or a conflicting edit applied), but the logic was inline in editCodeService and untested.
Extracted the DECISION (not the side effects) to pure node-tested common/editStreamRevertDecision.ts:
rangesOverlap(a,b) (touching endpoints count as overlap, matching the inline rule) + decideStreamRevert
({originalBoundsError, thisBlockRange, existingRanges}) -> {revert, errorMessage}. Byte-identical to the old
`if (typeof originalBounds === 'string' || hasOverlap)` check incl. the 'Has overlap' message and
error-takes-precedence ordering. The revert side effects (delete tracking zones, rewrite the file to the
original, abort the stream) stay inline. Re-narrowed originalBounds with an unreachable
`if (typeof originalBounds === 'string') return` after the revert guard (the old condition's
`typeof === 'string'` provided the narrowing the pure call drops; decideStreamRevert reverts for every
string, so the guard never fires).
Tests: overlap (disjoint/touching/contained/identical/partial/gap), locate-error precedence, 'Has overlap',
no-revert, first-block, some-semantics across existing ranges. tsgo 0; common suite 872 -> 881 (+9); hygiene
clean. (The revert branch itself needs a forced malformed multi-block stream to drive live, so it stays
unit-verified; the surrounding apply path is covered by the existing cdp atomic-edit harness.)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rovider RAG was BM25-only because ZERO embedding providers were ever registered with IAiEmbeddingVectorService (the whole vector pipeline -- gate, hybrid rerank, vector store adapters -- was wired but inert). Added a local Ollama embedding provider so semantic (BM25 + vector) retrieval activates when the user configures a model. Wiring (the renderer can't reach Ollama, so embeds route through electron-main, mirroring the LLM path): - electron-main sendOllamaEmbed (ollama SDK .embed) -> a new request-response 'ollamaEmbed' command on LLMMessageChannel, egress-gated defense-in-depth (refuses a non-loopback endpoint under local-only). - renderer LLMMessageService.ollamaEmbed (same loopback egress gate as ollamaList) returns the vectors. - OllamaEmbeddingProviderContribution (WorkbenchPhase.AfterRestored): when cortexide.rag.embeddingModel is set AND dispatchable, PROBES the model (so isEnabled() never lies), then registers an IAiEmbeddingVectorProvider; re-syncs on config / privacy-state change; unregisters when ineligible. - New cortexide.rag.embeddingModel setting (default '' = BM25-only; opt-in). - Pure node-tested common/ollamaEmbeddings.ts: extractEmbeddingVectors (throws on empty/ragged/non-finite so cosine similarity never gets garbage) + canUseOllamaEmbeddings eligibility. Privacy: Ollama is loopback, so chunks stay on-machine and the call is allowed under local-only (secrets are already redacted before embedding upstream). Any failure (no model / Ollama down) leaves retrieval gracefully on BM25. LIVE-VERIFIED against the running dev build (cortexide.rag.embeddingModel=nomic-embed-text, model pulled): the contribution logged "registered local embedding provider 'nomic-embed-text' -- hybrid retrieval active"; IAiEmbeddingVectorService.isEnabled()=true; getEmbeddingVector returned a real 768-dim vector; the electron-main round-trip returned 2x768 vectors. tsgo 0; common suite 881 -> 888 (+7); cdp-smoke 11/11; hygiene clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…n the working notes) PR #64 removed docs/ from main. This branch still carried the stale modernization/session docs (MODERNIZATION-BASELINE/HANDOFF, NEXT-SESSION-PROMPT, PHASE2-WIRING-PLAN, + the comparison docs), which a PR from this branch would have re-added. Removing them so the branch matches main and the modernization PR stays code-only. The consolidated, current status lives in the working notes outside the repo. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Continues the modernization drive (reliability before features; no fake safety; no fake tests; every claim maps to tested code). Branch health: tsgo 0, node
common/suite 888 passing / 0 failing, CDP smoke 11/11.Highlights
Two inert features ACTIVATED (live-verified via the CDP harness)
createParser) and never ranParser.init/Language.load, soextractSymbolsalways returned[](RAG was BM25-only on this axis). Rewired to delegate to the editor'sITreeSitterLibraryService(the same service the terminal command parser uses). Live: extracted real symbols from.ts/.py/.go; languages with no shipped grammar (c/swift/kotlin) fall back with no throw.IAiEmbeddingVectorService, so the whole vector pipeline was inert. Added a local Ollama embedding provider routed through electron-main IPC (the renderer can't reach Ollama), opt-in via the newcortexide.rag.embeddingModelsetting, registered by aWorkbenchContributionafter a startup probe. Privacy-safe (Ollama is loopback = on-machine; allowed under local-only). Live:isEnabled()true andgetEmbeddingVectorreturned a real 768-dim vector.Real bug fixes
export { Type }re-exports the esbuild dev-transpile couldn't elide broke workbench loading; fixed toexport type {/ inlinetype(cdp-smoke went from failing to 11/11).computePostEscalationCounters.partialSortk=0 crash and other reliability fixes from earlier in the drive.Behavior-preserving extractions to pure, node-tested
common/modulescomputeModelScore(50k differential fuzz vs a generated oracle + 11 goldens + a 4-lens adversarial-verify workflow), the streamingxmlToolCallScanner(4-lens workflow + differential fuzzes over random chunkings),editStreamRevertDecision, auditparseJsonl+ a "Show Audit Log" command,computeMaxTokensForLocalProvider,isLoopbackEndpoint, the treeSitter language map, and more. Each proven byte-identical (textual diff vs HEAD + fuzz vs an oracle) and committed atomically with the full suite measured.Notes
docs/is removed to matchmain(PR chore: remove the docs/ folder (internal tracking notes, not product docs) #64 removed it); modernization docs are kept out of the repo per convention.RedactingLogServiceILogServicewrap (constructed beforeIConfigurationService; secrets already redacted at source), the SSRF DNS-rebind preflight (renderer can'tdns.lookup; needs an electron-main resolve-IPC; building blocks landed), and theAgentLoopControllerclass refactor (risky; the pure decisions are already extracted).🤖 Generated with Claude Code