context

Universal Go core for project-scoped context management, retrieval, indexing, lexical evidence, and agent orchestration.

github.com/fastygo/context is not a chat application, a generic RAG wrapper, or a product companion. It is a reusable context operating layer for systems that need to turn user intent, files, documents, logs, tool outputs, rules, lexicons, and external sources into precise, inspectable, auditable context for automated work.

Why

Large language models are useful, but they do not solve context management. A serious agent system needs to know what evidence was selected, where it came from, why other evidence was rejected, which tool was allowed to run, which model saw which context, and how the final result can be replayed or debugged.

This module exists because plain RAG is not enough for long-lived projects:

vector search misses exact facts, citations, wordforms, morphology, source authority, and operational boundaries;
unbounded chat history is noisy, lossy, and hard to audit;
normalized text cannot replace original source text, snippets, quotes, or attestations;
tool calls without typed policy create hidden side effects;
background agents need ownership, cancellation, traces, and verification;
source-backed work must preserve spans, versions, checksums, and decisions;
model, vector, storage, language, lexicon, and tool providers must remain replaceable.

The engine is designed to combine deterministic information retrieval, source-backed context packs, typed tools, replaceable adapters, and agent/subagent orchestration without baking product identity, language-specific grammar, dictionary content, or scenario-specific products into the core.

What It Provides

The module is intended to provide foundations for:

project-scoped source and artifact memory;
deterministic indexing with manifests and incremental updates;
hybrid retrieval over dense vectors, sparse/keyword indexes, exact matching, morphology-aware lexical paths, graph traversal, source filters, recency, and tool outputs;
language-neutral contracts for tokens, lemmas, lexemes, wordforms, morphology features, analyses, and query expansion;
lexicographic contracts for senses, concepts, attestations, variants, registers, regions, time periods, and lexicon sources;
focus policies that keep retrieval and context packing scoped to the current task;
replayable ContextPack objects for model calls, tools, and subagents;
typed tool registration with permissions, risk levels, and structured results;
foreground and background AgentRun traces;
verification and evaluation loops for retrieval quality and factuality;
adapter boundaries for LLMs, embeddings, rerankers, vector stores, metadata stores, artifact stores, language analyzers, lexicon resources, crawlers, and product integrations.

Design Principles

Project scoped by default: every index, artifact, run, and decision belongs to an explicit project or workspace.
Evidence before generation: model calls receive selected context, not unbounded history.
Provenance is mandatory: facts point back to source spans, versions, checksums, attestations, or tool outputs.
Original text is preserved: normalization, lemmatization, query expansion, and concept mapping never replace source text.
Hybrid retrieval wins: dense vectors, sparse search, exact matching, morphology, graph traversal, recency, sense/concept filters, and citation signals cooperate.
Multilingual by contract: the core defines stable contracts; language complexity lives in context-lang-* adapters.
Lexicons are evidence resources: dictionaries, thesauri, historical lexicons, slang, and regional vocabularies live in resource adapters, not in the core.
Models are replaceable: LLMs, embedding models, and rerankers are adapters, not hardcoded infrastructure.
Tools are typed: every tool has a name, schema, permission policy, risk level, and structured result.
Agents are configurations: orchestration policy, rules, skills, tools, and model preferences are data-driven where possible.
Background work is explicit: scheduled or event-triggered agents are observable, cancellable, and auditable.
Brand neutrality is required: downstream products and companions configure identity on top of the core; core packages stay generic.

Core Runtime Model

TaskIntent
  -> PolicySnapshot
  -> FocusProfile
  -> RetrievalPlan
  -> RetrieverCalls
  -> CandidateSet
  -> RerankedEvidence
  -> ContextPack
  -> ModelCall | ToolCall | SubagentRun
  -> Verification
  -> Decision | Artifact | Result
  -> EvaluationTrace

The central object is ContextPack: the selected, ranked, budget-aware, source-backed context handed to a model, tool, verifier, or subagent. It should be versioned and replayable so bad retrieval, bad generation, or bad tool decisions can be debugged later.

Core Concepts

Project: an isolated workspace or tenant boundary.
Source: a file, document, URL, log stream, database snapshot, spec, chat history, or tool output.
Artifact: stored source material or generated intermediate output.
Chunk: an indexed source span with metadata and provenance.
TokenOccurrence: original token text with stable source offsets.
Lexeme, Lemma, WordForm: language-neutral references for lexical forms.
MorphAnalysis: one possible morphology analysis; ambiguity stays explicit.
QueryExpansion: explainable lexical or morphology-driven expansion.
Sense: a specific meaning of a lexeme.
Concept: a language-independent or domain concept connected to labels and senses.
Attestation: witnessed usage in a source, with quote, span, date, region, register, authority, and confidence.
Variant: orthographic, historical, regional, slang, spelling, or script variant.
MultiwordExpression: lexical unit spanning multiple tokens or syntactic words.
LexiconSource: dictionary, corpus, thesaurus, glossary, authority list, or community vocabulary source.
FocusProfile: the task-specific lens that defines scope, freshness, exactness, citation strictness, budgets, allowed tools, and irrelevant areas.
ContextPack: selected evidence and instructions for a model/tool/agent step.
AgentRun: a foreground, background, scheduled, or event-triggered execution trace.
ToolCall: a typed invocation with input, output, status, permissions, and side-effect metadata.
Evaluation: a reproducible check for retrieval quality or task correctness.

Linguistic And Lexicon Boundaries

The core is multilingual by contract, not by embedding every language inside the repository.

fastygo/context
  -> language-neutral contracts
  -> source spans, snapshots, retrieval, ContextPack, traces
  -> no language-specific dictionaries or grammar rules

context-lang-*
  -> normalization
  -> tokenization
  -> lexeme and wordform analysis
  -> morphology generation
  -> query expansion
  -> language-specific eval fixtures

Language adapters may support Russian, English, German, Spanish, French, Hindi, Indic languages, and future languages. They must preserve source offsets, analyzer versions, dictionary versions, ambiguity candidates, and expansion provenance. Core contracts should stay compatible with portable schemes such as Universal Dependencies and UniMorph while allowing adapter-owned raw metadata.

Lexicon resource adapters are separate from language analyzers. They map dictionaries, TEI resources, SKOS/ISO 25964 concept schemes, historical lexicons, regional vocabularies, slang, and community terminology to neutral contracts such as Sense, Concept, Attestation, and LexiconSource.

TokenOccurrence
  -> WordForm
  -> Lemma
  -> Lexeme
  -> Sense
  -> Concept
  -> Attestation
  -> SourceSpan
  -> ContextPackEvidence

Lexeme and morphology answer "which form." Sense, concept, and attestation answer "which meaning, where, when, in which register, and according to which evidence."

Indexing Pipeline

The indexing pipeline is source-agnostic:

source adapter
  -> artifact store
  -> parser
  -> chunker
  -> tokenizer
  -> language adapter
  -> enricher
  -> manifest
  -> dense vector index
  -> sparse/exact index
  -> graph index
  -> metadata store

Different source types need different chunking strategies. Source code, technical documentation, scientific text, legal text, dictionary entries, usage citations, chat history, logs, web captures, and tool output should not be split with the same rules.

Retrieval Pipeline

Retrieval is a planning problem, not one vector query:

task intent
  -> focus profile
  -> retrieval plan
  -> parallel retriever calls
  -> candidate merge
  -> deduplication
  -> reranking
  -> evidence validation
  -> context pack

Supported retrieval paths should include:

dense vector search;
sparse/BM25-style search;
exact phrase and source-span search;
lemma, wordform, and morphology-expanded search;
sense, concept, attestation, register, region, and time-period filters;
entity and metadata filters;
citation lookup;
graph traversal;
recent activity retrieval;
tool result retrieval;
external source retrieval through explicit adapters.

Every retrieval contribution should be explainable. A candidate that matched through a generated wordform, fuzzy variant, concept label, or attestation must preserve the original surface text and source span.

Tool And Agent Runtime

Tools are registered through typed metadata:

tool
  name
  description
  input schema
  output schema
  permission policy
  risk level
  side-effect class
  timeout
  background support

Agents coordinate retrieval, context packs, model calls, tools, verifiers, and subagents. Subagents run with isolated context and return structured summaries or artifacts to the parent run. Concrete products should integrate through adapters, tools, graph projections, rules, skills, contracts, or companion configuration.

Architecture Guidance

The canonical planning documents live in .project/:

.project/roadmap-context-core.md: architectural baseline and phased roadmap.
.project/progress.md: copy-paste plan chunks from baseline to PoC.
.project/future-layer.md: deferred production-grade layers and review gates.
.project/plugins/language-adapters.md: roadmap for context-lang-* language adapters.
.project/plugins/lexicon-resources.md: roadmap for dictionaries, thesauri, attestations, historical lexicons, and controlled vocabulary resources.

The project skill lives in .cursor/skills/context-core-steward/ and should be used when planning, implementing, reviewing, or debugging this repository. It keeps work aligned with DDD, Clean Architecture, SOLID, DRY, TDD, traceability, brand-neutral API boundaries, and the current roadmap.

Suggested Package Direction

The public API should stay small until boundaries are stable.

cmd/
  context-dev/              # local developer CLI for indexing, search, evals

internal/
  agentruntime/             # agent runs, orchestration, subagents, scheduling
  artifacts/                # artifact metadata and stores
  config/                   # project config, rules, ignore patterns
  corpus/                   # projects, sources, chunks, provenance
  evals/                    # retrieval and task evaluation harnesses
  graph/                    # entity, citation, co-occurrence, dependency edges
  indexing/                 # parsing, chunking, enrichment, manifests
  lexicon/                  # sense, concept, attestation, resource contracts
  linguistic/               # language-neutral contracts and simple adapters
  models/                   # LLM, embedding, reranker interfaces
  policy/                   # permissions, risks, approvals
  retrieval/                # planners, retrievers, rerankers, context packs
                            # and focus profiles
  storage/                  # metadata store abstractions and adapters
  tools/                    # registry, schemas, execution
  tracing/                  # append-only runtime events and redaction

pkg/
  contextkit/               # stable public interfaces, added only when proven

Prefer internal while interfaces are changing. Move packages to pkg only when another module needs a stable import surface.

First Proof Target

The current hypothesis-validation path is:

local project corpus
  -> deterministic indexing
  -> PostgreSQL + pgvector-backed metadata/vector search path
  -> real CLI ingestion and retrieval
  -> context pack creation
  -> fake model/tool agent run
  -> source-backed verification trace

The first proof is not a polished product. It is a working CLI loop that shows the architecture can ingest project sources, retrieve relevant evidence, build a context pack, execute a typed model/tool step, verify source-backed claims, and replay the trace. It should prove neutral contracts with simple/fake language and lexicon fixtures before adding production language or dictionary adapters.

Non-Goals For The First Version

Generic autonomous control of arbitrary systems.
Unlimited web crawling.
Plugin marketplace.
Scenario-specific products in the core package tree.
Multi-tenant billing.
Complex UI framework ownership.
Hard dependency on one model provider.
Hard dependency on one vector database.
Language-specific dictionaries, grammar rules, or morphology engines in core.
TEI/SKOS importers, historical dictionaries, regional vocabularies, slang lexicons, or community lexicon resources in the first PoC.
Implicit background writes without audit and approval policy.
Full distributed worker orchestration.
Production-grade query language.

Engineering Notes

Keep raw source storage, embeddings, metadata, indexes, traces, and generated artifacts separate.
Version embedding models, parsers, chunkers, tokenizers, analyzers, dictionaries, sparse indexes, and graph schemas.
Never replace original source text with normalized text.
Do not collapse senses into lemmas or concepts into labels.
Treat generated wordforms as expansion candidates, not as attestations unless they are witnessed in a source.
Treat long tool outputs as artifacts that can be searched and read in slices.
Prefer deterministic verification for claims that depend on source material.
Record enough run data to reproduce bad retrieval and bad tool decisions.
Enforce permission and side-effect decisions outside the model.
Design every background action as an event with policy, trace, owner, and cancellation.

Status

Early design-stage module. Public APIs are expected to change until the core runtime and proof-of-concept workflows stabilize.

License: MIT

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.cursor		.cursor
.project		.project
.cursorignore		.cursorignore
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

context

Why

What It Provides

Design Principles

Core Runtime Model

Core Concepts

Linguistic And Lexicon Boundaries

Indexing Pipeline

Retrieval Pipeline

Tool And Agent Runtime

Architecture Guidance

Suggested Package Direction

First Proof Target

Non-Goals For The First Version

Engineering Notes

Status

About

Uh oh!

Releases

Contributors

Uh oh!

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

context

Why

What It Provides

Design Principles

Core Runtime Model

Core Concepts

Linguistic And Lexicon Boundaries

Indexing Pipeline

Retrieval Pipeline

Tool And Agent Runtime

Architecture Guidance

Suggested Package Direction

First Proof Target

Non-Goals For The First Version

Engineering Notes

Status

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!