Private local AI for codebases. Local by default. Remote inference only when explicitly configured.
AndesCode runs Gemma 4 26B on your hardware in LOCAL mode. It indexes your codebase, understands your project structure, and answers questions through its own native desktop interface. It can also run in REMOTE_INFERENCE mode when explicitly configured for private/self-hosted infrastructure.
LOCALmode (default): repository, index, retrieval, prompts, and inference stay on the same machine.REMOTE_INFERENCEmode: repository and index stay local, but selected retrieved chunks plus metadata are sent to your configured private remote server for inference.- AndesCode does not provide a hosted SaaS.
Every cloud coding assistant has the same architecture: your code leaves your machine, hits someone else's server, and comes back as a suggestion. For most developers, that's a fine trade-off.
For some, it isn't.
| AndesCode | GitHub Copilot | Cursor | Claude | |
|---|---|---|---|---|
| Code stays on your machine | ✅ | ❌ | ❌ | ❌ |
| Works fully offline | ✅ | ❌ | ❌ | ❌ |
| No token bills | ✅ | ❌ | ❌ | ❌ |
| Local audit log | ✅ | ❌ | ❌ | ❌ |
| Frontier-class model | ✅ | ✅ | ✅ | ✅ |
| Deterministic / no outages | ✅ | ❌ | ❌ | ❌ |
AndesCode is built for developers who work with client code under NDA, operate in regulated industries (healthcare, legal, finance, defense), or simply believe their code is their own.
- Teams working with sensitive or proprietary code (NDA, IP-heavy projects)
- Companies in regulated environments (finance, healthcare, legal)
- Developers who want full control over their AI tooling and data flow
- 🧠 Gemma 4 26B — high-capability open-weight model running entirely on your hardware
- 🔍 Codebase-aware — indexes your project, builds a project map, injects relevant context automatically
- 🗺️ Project intelligence — detects language, stack, entry points, domain, and key symbols on indexing
- 🔎 Smart retrieval — two-step planning (model selects relevant files first), query routing by filename/symbol/intent, and 4-axis re-ranking
- 🕸️ Optional graph-aware hybrid retrieval — when enabled, AndesCode builds local code graph artifacts and can expand baseline retrieval with symbol, import, and reference neighbors for complex multi-file questions
- Model-free A/B eval guide:
docs/hybrid-retrieval-eval.md
- Model-free A/B eval guide:
- 🎯 Token-aware context packing — prompt assembly is budgeted against model context window, with deterministic priority-based truncation instead of overflow failures
- 🧱 Multi-layer caching — repo-fingerprint-scoped workspace/retrieval/neighborhood/prompt-prefix/patch-plan caches with strict invalidation
- 📌 Deterministic routing for repo questions — config/dependency/manifest questions use a source-of-truth config-first path before inferred code usage
- 🛠️ Safe edit/apply primitive (v1) — deterministic single-file exact-match edits with hash stale-context protection and unified diff preview
⚠️ Coverage warnings — the model is told when it has a partial view of a file, so it never pretends to have context it doesn't- 🔒 Execution-mode privacy controls —
LOCALkeeps inference on-device with offline flags;REMOTE_INFERENCEsends only selected retrieved chunks + metadata to your configured private server - ⚡ Fast — KV cache warm-up on startup, 30–40 tokens/second on Apple Silicon, streaming responses
- 🖥️ Native desktop app — runs as a native window on macOS and Windows via the built-in web UI
- 📋 Audit log — every request logged locally with metadata only; proof of isolation for compliance
| Platform | Hardware | RAM / VRAM |
|---|---|---|
| Apple Silicon Mac | M1 / M2 / M3 / M4 | 32GB unified memory |
| Windows / Linux | NVIDIA RTX 3090, 4090, 5090 | 24–32GB VRAM |
- Python 3.10+
- ~18GB free disk space
1. Clone
git clone https://github.com/buster92/andes-code
cd andes-code2. Run the launcher
python3 launch.pyThat's it. On first run the launcher:
- Detects your hardware (Apple Silicon → Metal, NVIDIA → CUDA)
- Installs all dependencies with the correct GPU flags
- Opens the AndesCode native window, which automatically:
- Downloads Gemma 4 26B (~16GB) from Hugging Face — progress shown on screen, resumes if interrupted
- Loads the model into memory
- Starts the local server
From there, the app guides you through indexing your project and you can start asking questions immediately. On subsequent runs, python3 launch.py just starts the app — model already cached, ready in seconds.
Index your project
↓
Files are chunked with language-aware boundary detection
Embeddings stored in ChromaDB (local)
Project map built: language, stack, domain, entry points, symbol index
Optional local code graph artifacts built under the existing index directory:
symbol_graph.json, import_graph.json, repo_graph_state.json
Workspace intelligence cached to disk (schema-versioned, artifact-level reuse)
↓
You ask a question in the AndesCode window
↓
Diagnosis + patch-plan stages run before final generation
Safe descriptive queries can reuse scoped semantic cache
Prompt built from deterministic sections for prefix/KV reuse
Config/dependency questions take a fast path (skip patch-planning flow)
↓
Step 1 — Planning: model scans your project map and identifies
the most relevant files for your question
↓
Step 2 — Retrieval: those files are loaded in full, plus
semantic search fills any gaps the planner missed
Default retrieval is unchanged. If ANDESCODE_HYBRID_RETRIEVAL=1,
AndesCode can also blend semantic candidates with graph neighbors
from imports, symbols, filename matches, references, and existing
source-of-truth chunks.
↓
Token-aware packing keeps anchor/planned/neighbor files first and
truncates lower-priority context when needed to stay under model limits
↓
Project map + code context injected into system prompt
Coverage warnings added if any file is only partially retrieved
↓
Gemma 4 generates a response grounded in your actual codebase
Streams to the UI with timing metadata
↓
Everything logged locally. Code never uploaded.
Edit Suggestion Mode v1 is a read-only investigation path for requests that clearly ask for code changes, bug fixes, performance improvements, concrete updates, or implementation guidance. Examples include “fix this bug,” “make this faster,” “suggest one update,” “change this behavior,” “why is this failing?”, and “what code should I edit?”
Broad analysis questions such as “explain the performance path,” “where is AddToCart defined?”, or “how does updateSchedule work?” stay in normal Q&A unless the user also asks for a concrete edit. When triggered, AndesCode changes from normal Q&A into a stricter repo-grounded workflow:
- Classifies the request as
edit_suggestionbefore retrieval. - Combines semantic retrieval with filename/path hints, symbol/reference signals, import-neighborhood expansion, and related test/config discovery.
- Loads full indexed contents for likely edit targets before recommending a patch-level change.
- Traces the likely call or data path from retrieved imports, symbols, methods, classes, and references.
- Checks whether the requested mechanism already exists in the retrieved files so it does not suggest an existing cache/retry/validation/test mechanism as if it were new.
Edit Suggestion Mode responses must use this contract:
- Finding — the current behavior grounded in retrieved files.
- Evidence — concrete file paths plus symbols, methods, or classes used to reach the finding.
- Recommended change — one minimal change, not a broad option list.
- Patch plan — file-by-file changes with method/function names; snippets only when they come from retrieved context.
- Validation — specific commands inferred from repo structure, or an explicit note that no test command could be inferred.
- Confidence — high/medium/low based on retrieved-context completeness.
What it does not do yet:
- It does not automatically modify files.
- It does not run risky shell commands as part of answering.
- It does not replace Safe Edit/Apply v1; it only improves investigation and patch recommendations before any future apply flow.
This differs from normal Q&A because AndesCode must read concrete files first. If it cannot identify relevant files or symbols, it must say: “I do not have enough repo-grounded context to propose a safe edit.” It then lists the missing files or symbols instead of giving generic architecture advice.
AndesCode includes a minimal, deterministic file edit primitive for controlled code updates.
EditOperation(
file_path="src/example.py",
old_content="return old_value",
new_content="return new_value",
)- Exact-match only:
old_contentmust match exactly in the target file. - No fuzzy matching or fallback behavior.
- Stale-context protection: apply is blocked when on-disk file hash differs from indexed hash.
- Writes are blocked when the file is missing or not indexed.
- Successful writes trigger single-file re-index only (no full rebuild).
Use unified diff preview before apply:
generate_diff_preview(old_text, new_text, file_path="src/example.py")- Single-file operations only.
- One exact match required for deterministic replacement.
- No autonomous planning or multi-file orchestration.
- Your source code (never read by any external server)
- ChromaDB vector embeddings of your code
- Every query and every response
- Runtime logs in
~/Documents/AndesCode/(server.log,app.log) - Project map, symbol index, optional code graph artifacts, and file hash cache
Offline environment flags are set at process startup before model libraries initialize, preventing outbound network calls during inference.
os.environ["TRANSFORMERS_OFFLINE"] = "1"
os.environ["HF_DATASETS_OFFLINE"] = "1"
os.environ["HF_HUB_OFFLINE"] = "1"| Item | Size | Source |
|---|---|---|
| Gemma 4 26B Q4 model | ~16 GB | Hugging Face |
all-MiniLM-L6-v2 embeddings |
~90 MB | Hugging Face |
Both are cached permanently after first run.
Runtime logs are written outside the repository to:
~/Documents/AndesCode/server.log(chat + indexing pipeline phases)~/Documents/AndesCode/app.log(desktop wrapper lifecycle/setup)
Logs record metadata only — no code content, no query text, no responses. Absolute paths and usernames are stripped from log entries.
2026-04-08 09:15:33 | CHAT d24024dd | phase=request_received | max_tokens=1024 | message_count=1
2026-04-08 09:15:34 | CHAT d24024dd | phase=context_build_start | path=direct_retrieval
2026-04-08 09:15:42 | CHAT d24024dd | phase=generation_completed | context_s=1.1 | think_s=2.3 | ttft_s=2.1 | total_s=8.4 | chunks=47
Logged: request ID, token count, file names of retrieved chunks, timing.
Never logged: query text, response text, code content, file paths, usernames.
If the UI shows status updates but no answer:
- Open
~/Documents/AndesCode/server.log. - Find the request by
request_idand inspect the finalphase=...line. - For failures, look for
phase=pipeline_failedplusfailed_phase=...anderror=....
The frontend now surfaces backend stream failures as visible assistant errors, and streams always terminate with [DONE] to prevent indefinite spinner/status hangs.
AndesCode now computes a real prompt budget from the model context window before injecting retrieved code. If retrieval is too large, context is truncated by deterministic priority:
- Anchor files explicitly mentioned in the question
- Planner-selected files
- Neighborhood-expanded files
- Semantic fallback chunks
Relevant environment knobs:
MODEL_CONTEXT_WINDOW(default8192)CONTEXT_RESERVED_RESPONSE_TOKENS(default1400)CONTEXT_SAFETY_MARGIN_TOKENS(default256)
| Phase | Network | Notes |
|---|---|---|
| First-run model download | ✅ Once | ~16GB from Hugging Face |
| First-run embedding download | ✅ Once | ~90MB from Hugging Face |
| Indexing | ❌ Never | Fully local |
| Answering queries | ❌ Never | Fully local |
- Real dotenv files are skipped by default during indexing to reduce accidental secret ingestion (
.env,.env.local,.env.development,.env.production,.env.test,.env.staging). - Dotenv examples/templates are indexed (
.env.example,.env.sample,.env.template,example.env,sample.env,template.env) so AndesCode can understand configuration shape without storing real secret values. - Other env-like files (for example
config.envorsecrets.env) are skipped by default unless they match the explicit template/example allowlist.
| Hardware | Model | Speed |
|---|---|---|
| Apple M1/M2 Pro 32GB | Gemma 4 26B Q4 | ~20–30 t/s |
| Apple M3/M4 Pro 32GB | Gemma 4 26B Q4 | ~30–40 t/s |
| Apple M2/M3 Max 64GB | Gemma 4 31B Q4 | ~25–35 t/s |
| NVIDIA RTX 3090/4090 24GB | Gemma 4 26B Q4 | ~35–50 t/s |
| NVIDIA RTX 5090 32GB | Gemma 4 31B Q4 | ~50–70 t/s |
All configuration lives in .env:
MODEL_PATH=models/gemma-4-26B-A4B-it-UD-Q4_K_XL.gguf
PORT=8080
CONTEXT_CHUNKS=5 # code chunks injected per query
MODEL_CONTEXT_WINDOW=8192
CONTEXT_RESERVED_RESPONSE_TOKENS=1400
CONTEXT_SAFETY_MARGIN_TOKENS=256
CACHE_SIZE_GB=2.0 # KV cache size allocated at startup
TRANSFORMERS_OFFLINE=1
HF_DATASETS_OFFLINE=1
HF_HUB_OFFLINE=1
TOKENIZERS_PARALLELISM=false
ANDESCODE_EXECUTION_MODE=LOCAL # LOCAL (default) or REMOTE_INFERENCE
ANDESCODE_REMOTE_SERVER_URL=http://127.0.0.1:8080 # used only in REMOTE_INFERENCE mode
ANDESCODE_HYBRID_RETRIEVAL=0 # set to 1 to enable experimental graph-aware hybrid retrievalFor large projects or architectural questions, increase CONTEXT_CHUNKS to 7–10. The retrieval pipeline automatically widens its candidate pool for broad queries — this setting controls how many final chunks land in the prompt.
AndesCode does not continuously watch your files or run background automatic index refresh. This keeps idle CPU near baseline when the app is open but no query or indexing job is running.
- Index Project creates or incrementally refreshes the current project index.
- Reindex Project is the explicit full rebuild path; it rebuilds vectors, workspace metadata, symbol index, and graph artifacts.
- When you return focus to AndesCode, it may do a throttled freshness check and show a non-blocking prompt if the project changed. Choosing Refresh Index runs a normal incremental refresh; dismissing does nothing.
- When you ask a question, AndesCode checks whether the indexed files changed before retrieval/model generation. If files changed, it runs an incremental index refresh first, then continues with retrieval and answer generation. If that refresh fails, AndesCode stops instead of answering from stale context.
ANDESCODE_HYBRID_RETRIEVAL is disabled by default. Set it to 1 when testing complex multi-file questions that may benefit from local graph expansion. It uses only local graph artifacts stored in the existing index directory and does not change AndesCode's privacy behavior. See the model-free A/B eval guide for pilot measurement: docs/hybrid-retrieval-eval.md.
LOCAL(default): existing end-to-end behavior (local indexing, retrieval, and inference).REMOTE_INFERENCE: local indexing + local retrieval remain on the client host. Before building the remote payload, the client checks index freshness and performs any needed local incremental refresh. The client then builds a strict structured payload (query, workspace metadata, retrieval metadata, retrieved chunks, options) and sends it to${ANDESCODE_REMOTE_SERVER_URL}/v1/askfor inference-only generation. Server-side/v1/askanswers only from that payload and must not scan, watch, or index the repository.- Remote payload contract reference:
docs/remote-inference-contract.md.
- The repository and index stay local on the client.
- AndesCode sends only selected retrieved chunks + retrieval/workspace metadata to the configured remote server.
- AndesCode does not send the full repository to the server in
REMOTE_INFERENCEmode. - v1 is inference-only (Q&A). Code editing/patch application is not included.
When ANDESCODE_EXECUTION_MODE=REMOTE_INFERENCE, AndesCode emits lightweight metadata logs (no full chunk dumps) in andes_cache/audit.log:
- Client/proxy path:
request_id, execution mode enabled,workspace_id, branch/commit (if available), retrieved chunk count, payload send start/success/failure. - Server inference path (
/v1/ask):request_id, protocol version, received chunk count, validation failures, generation start/end, stream completion/failure.
Common remote error codes surfaced to clients:
remote_unreachable— remote server cannot be reached.validation_error— payload failed schema validation.unsupported_protocol— protocol mismatch (currently onlyandes.remote.v1).empty_retrieval— no chunks available for remote inference.remote_stream_interrupted— streaming ended unexpectedly.
AndesCode indexes a broad set of text-based project files:
- Code: Python, JavaScript/TypeScript, Java, Kotlin, Swift, Go, Rust, C/C++, C#, Ruby, PHP
- Data/statistics: R, SQL, Jupyter notebooks
- Docs: Markdown, MDX, TXT
- Config/build: TOML, YAML/YML, XML,
.properties, Gradle files, package manifests - Scripts/web: SH, Bash, HTML, CSS
Binary, oversized, and generated files are skipped.
See docs/indexing-policy.md for the exact indexing and skip policy.
- Lightweight explicit freshness UX improvements
- Full AST-aware chunking and richer tree-sitter extraction across languages — deeper boundary detection beyond regex
- KVTC context compression — fit larger codebases in context
- Private tunnel (Tailscale/WireGuard) for mobile access
- iOS/Android chat client
- Cryptographic egress proof for SOC 2 compliance
- Pre-configured hardware bundle (Mac Mini)
AndesCode is designed to run fully locally and offline during inference.
However, users are responsible for validating their own environment and dependencies for compliance requirements. AndesCode does not claim formal certification (e.g., SOC 2, ISO) at this stage.
See docs/security-threat-model.md for detailed threat model and enterprise controls.
Does any code leave my machine?
In LOCAL mode, no repository content leaves your machine during inference. In REMOTE_INFERENCE mode, AndesCode sends only the retrieved chunks and metadata required for answer generation to your configured remote server (ANDESCODE_REMOTE_SERVER_URL), not the full repository.
Does it integrate with VS Code, Cursor, or other IDEs?
Not at this time. AndesCode is a standalone desktop app with its own interface. IDE plugin integration is on the roadmap but not currently supported.
Can I use a different model?
Yes — any GGUF model compatible with llama.cpp. Update MODEL_PATH in .env.
Does it work on Windows or Linux?
Yes, with an NVIDIA GPU. launch.py detects nvidia-smi and compiles llama-cpp-python with CUDA automatically. Metal acceleration is Apple Silicon only.
Answers seem generic or miss important files. What's wrong?
Check that indexing completed — you should see ✅ Done — X files. For large projects, increase CONTEXT_CHUNKS in .env. You can also reference a specific file by name in your question — AndesCode will load all indexed chunks from that file directly.
How do I re-index after changing files?
Run python3 indexer.py /path/to/your/project again. MD5 hashing ensures only changed files are re-processed — unchanged files are reused from the existing index instantly.
How do I inspect cache behavior?
See docs/cache-debugging.md for cache layout, metrics, and invalidation troubleshooting. You can run python3 benchmark_cache.py for cold vs warm cache instrumentation.
How do I enable structured retrieval debug mode?
Debug mode is off by default. You can enable it via:
- Environment variable:
ANDESCODE_DEBUG_MODE=1 - API flag: include
"debug_mode": truein/chat/completionsor/v1/debug/explain - Function parameter:
search(query, debug_mode=True)inindexer.py
When enabled, AndesCode emits a deterministic debug payload with intent, source-of-truth selection, retrieval/ranking decisions, and failure signals. The web UI shows it in a collapsible panel.
When graph-aware hybrid retrieval is enabled, debug payloads also include fields that show whether graph retrieval actually changed the selected context:
retrieval_routes_used: retrieval components that contributed candidates, such as semantic vector search, source-of-truth retrieval, exact symbol lookup, filename lookup, import neighbors, or reference neighbors.graph_neighbors_added: graph-expanded files added beyond the semantic seed files.symbols_matched: exact symbol matches found insymbol_graph.json.files_selected_by_graph: files selected by symbol/import/reference/filename graph logic.files_selected_by_semantic: files selected by semantic vector search before graph expansion.files_selected_by_authority: source-of-truth files force-included for config/dependency/declaration questions.context_sufficiency_notes: concise notes such as whether graph artifacts were missing, graph neighbors were added, or no high-confidence graph neighbors were found.
For declaration/config/dependency questions, debug payloads now include authoritative retrieval guarantees:
authoritative_files_detected: authoritative paths found in workspace metadata.authoritative_files_required: authoritative paths that must be retrieved for this query.authoritative_files_retrieved/authoritative_files_missing: explicit split of indexed vs missing authoritative sources.forced_authoritative_file: whether authoritative context was force-included.authority_selection_reason: why authoritative paths were selected.authority_retrieval_mode:direct_chunk_load,integrity_repair,workspace_only_detected_not_indexed,semantic_fallback_blocked, orruntime_fallback_used.declaration_answer_mode:declared_only,declared_plus_runtime,declared_partial_only,runtime_only_fallback, ormissing_declarations.
If AndesCode detects authoritative files in workspace metadata but cannot retrieve chunks from the index, it now emits an explicit limitation (workspace_only_detected_not_indexed) instead of silently degrading to inferred/runtime-only answers.
Is there a hosted version?
No official AndesCode-hosted SaaS is available. Remote inference is supported only for self-hosted/private-network deployments that you configure and operate.
AndesCode is source-available.
- Free for personal use and internal company use
- Commercial redistribution, resale, managed hosting, or offering AndesCode as a service requires a commercial license
See LICENSE for full terms.
This licensing model allows teams to use AndesCode freely inside their organization, while preventing third parties from reselling or hosting it as a competing service.
PRs welcome.
Highest-value contributions right now:
- Windows / Linux setup testing and documentation
- Full AST-aware chunking and richer tree-sitter extraction across languages (beyond PR #53's v1 graph-aware hybrid retrieval)
- Lightweight explicit freshness checks and indexing UX improvements
AndesCode tests are split by execution tier:
tests/unit/— deterministic pure-logic tests (default CI suite).tests/integration/— indexer/embedding/server/model tests (opt-in).tests/eval/— quality evaluation suites (opt-in).
Default CI runs only tests/unit/ using requirements-ci.txt, so it does not require:
- a running AndesCode server
- Hugging Face/network downloads
- cached embedding models
- a loaded LLM model
- local
audit.log - full runtime dependencies from
requirements.txt(llama-cpp-python,chromadb,sentence-transformers,huggingface_hub)
Full runtime dependencies from requirements.txt are only required for local app usage plus opt-in integration/model/eval test tiers.
Run full validation locally when you have model + server dependencies available:
ANDESCODE_RUN_INTEGRATION_TESTS=1 ANDESCODE_RUN_MODEL_TESTS=1 python3 -m pytest tests/integration -v
ANDESCODE_RUN_EVAL_TESTS=1 python3 tests/eval/eval_runner.py --suite fast --fixture androidNote 1: eval
fastis model-free, but it is not embedding-free. It still depends on retrieval/index embedding availability.Note 2: answer eval (
--suite evalor--suite fullphase 2) is currently Android-only (--fixture android).Note 3: integration/model/server/eval tiers are all opt-in and intentionally excluded from default CI.
Optional future CI split:
- Keep default PR CI on
tests/unit/for speed and determinism. - Add a separate model-enabled workflow/job (manual, nightly, or protected-branch) for integration + answer eval.
AndesCode is built by an independent developer from Latin America. It exists because some teams require full control over their code, infrastructure, and data flow.
Source-available. Free for personal use and internal company use. Commercial redistribution, resale, managed hosting, or offering AndesCode as a service requires a commercial license.
Private AI for your codebase — local by default, with optional private remote inference when explicitly configured.