Memscribe

Deterministic, zero-LLM conversation capture for AI coding agents.

Memscribe tails the transcript logs your AI coding agents already write — Claude Code, Codex, Gemini, Cursor, Windsurf, Zed, VS Code / Copilot, and any OpenTelemetry-instrumented agent — and prepares them into typed, queryable nodes. No model calls. Same bytes in, same nodes out, every time.

What it is

A coding agent is a stream of decisions and edits — "use Postgres instead of MySQL," followed by the diffs that implement it. That stream is gold for memory, audit, analytics, and replay, but it's buried in five different churning log formats. Memscribe is the boring, deterministic half of a memory system: it reads those logs, normalizes them, and emits typed nodes — and because it never calls a model, its output is an exact function of its input.

That single property is the whole point. It makes capture golden-file, property, and fuzz testable, so the day a tool changes its format, the test suite fails loudly instead of silently corrupting your memory.

Where Memscribe sits

Memscribe is the foundation of a three-layer stack. Each layer uses the one below it, and the dependency only ever points downward:

Layer	Role	Calls a model?
Memtrace	The product — a code-intelligence graph with agent memory	—
MemCortex	Inference & governance — the judgment calls on top of the captured data	yes
Memscribe	Deterministic capture — normalizes transcripts into typed nodes (this repo)	no

Memtrace builds on MemCortex; MemCortex builds on Memscribe. Because Memscribe sits at the bottom, depends on nothing above it, and never calls a model, the boundary between the layers is a single stable data type — which is exactly what keeps this layer small, auditable, and exhaustively testable.

The pipeline

One linear, deterministic pipeline. Every stage is a trait, so it can be tested in isolation and swapped.

  Source (memscribe-io)          Adapter (memscribe-adapters)
  tail JSONL / hook stdin   ─►   parse_line ─► CaptureEvent[]
  / OTLP receiver                (version-tolerant)
         │  RawRecord(bytes + provenance)        │  normalized events
         ▼                                       ▼
  Gate ─► Segmenter ─► Binder ─► NodePrep   ─►   Sink (memscribe-sink)
  admit?   arc / turn  decision   Prepared        MemDB · ndjson · sqlite
  markers  spans       ↔ edit     Node

Source → Adapter produces a normalized CaptureEvent stream — the system of record. Gate → Segmenter → Binder → NodePrep turn that into PreparedNodes. The Sink writes them. Everything between Source and Sink is pure and synchronous given the event stream, which is what makes the whole thing golden-testable end to end.

Quick start

# Parse a transcript to NDJSON (the workhorse — great for trying it out)
cargo run -p memscribe-cli -- parse ~/.claude/projects/<slug>/<session>.jsonl --as claude_code

# Tail your agents live and write prepared nodes to a local SQLite store
cargo run -p memscribe-cli -- watch --tools claude,codex,gemini --sink sqlite --out memory.db

# See exactly what the redaction pass would strip from a file
cargo run -p memscribe-cli -- redact session.jsonl

Every tool's transcript normalizes to the same shape. Here a Claude Code decision-and-edits session becomes four kinds of node:

// memscribe parse fixtures/claude_code/2.0/happy_path_decision_then_edits.jsonl --as claude_code
{"node":"conversation","text":"Let's use Postgres instead of MySQL for the orders service.",
 "markers":[{"rule_id":"decision_verb.use",...},{"rule_id":"rejection.instead_of",...}], "fact_status":"observed"}
{"node":"decision","epitome":"Let's use Postgres instead of MySQL ...",
 "considered_options":[{"text":"MySQL","chosen":false},{"text":"Postgres","chosen":true}],"is_ban":false}
{"node":"episode","path":"src/db/config.rs","diff":{"added_lines":1,"removed_lines":1,...}}
{"node":"binding","relation":"produced","prov":{"t_use":"...10:00:00Z","t_gen":"...10:00:03Z"},
 "fact_status":"deterministically_derived","correlation":{...}}

Supported tools

Nine version-tolerant adapters, each behind a Cargo feature flag. Parsers pattern-match the fields they need and route anything unrecognized to Unknown — they never panic and never drop a record.

Tool	Transcript source	Status
Claude Code	`~/.claude/projects/<slug>/<session>.jsonl` (append-only JSONL, DAG via `parentUuid`)	✅ native
Codex CLI	`~/.codex/sessions/.../rollout-*.jsonl[.zst]` (`apply_patch` V4A diffs, transparent zstd)	✅ native
Gemini CLI	`~/.gemini/tmp/<hash>/chats/session-*.jsonl` (`$set` / `$rewindTo` control lines)	✅ native
OpenTelemetry	OTLP / GenAI semconv records — the universal fallback for any instrumented agent	✅ native
Cursor · Windsurf · Zed · VS Code · Copilot	exported chat JSON (desktop stores are SQLite/undocumented — export-based, per the whitepaper)	✅ export-shape

All five CLI/OTel scenarios and the cross-tool conformance suite prove these adapters are interchangeable behind the contract.

Usable with MemDB — and fully usable without it

The seam is the Sink trait. Nothing in the pipeline knows what a sink does with a node:

pub trait Sink: Send {
    fn emit(&mut self, node: &PreparedNode) -> Result<(), SinkError>;
    fn flush(&mut self) -> Result<(), SinkError>;
}

Sink	Feature	Use
`NdjsonSink`	default	One JSON node per line — the canonical, audit-friendly default.
`SqliteSink`	default	A queryable local store with zero external services.
`MemDbSink`	`--features memdb`	Writes nodes into MemDB with bi-temporal headers, for Memtrace. Off by default.

Remove the memdb feature and Memscribe is a complete, auditable, local capture tool. See crates/memscribe-sink/MEMDB.md for the integration design.

The output contract

Memscribe only ever emits nodes with Observed or DeterministicallyDerived fact-status. Anything that would require inference (fine-grained decision typing, statistical ranking) is flagged for a downstream layer, never guessed.

Node	Meaning	Fact status
`Conversation`	A gated, verbatim dialogue span with the commitment markers that fired	`Observed`
`Decision`	Parsed deterministically (IBIS/QOC/MADR/Kruchten): epitome, options, `is_ban`	`Observed`
`Episode`	The edit(s): path, diff, git sha	`DeterministicallyDerived`
`Binding`	decision → episode, with PROV (`t_use ≤ t_gen`) + correlation tuple	`DeterministicallyDerived`

The commitment-marker gate (a config-driven, unit-tested rule table over decision verbs, rejections, bans, and imperatives) is the gate-before-store that the production audits showed is the difference between a working memory and a 97.8%-junk one.

Why deterministic matters

	Memscribe	LLM-based capture
Output is a function of input	✅ exact	❌ varies run to run
Golden / property / fuzz testable	✅	❌
Cost per session	$0.00	API tokens
Reads your prompts & secrets	locally, redacted, auditable	sent to a model
Fails when a format changes	loudly (a test)	silently (bad data)

Testing

Because the pipeline is zero-LLM, it's tested the way a compiler is — fixtures in, exact expected output. The test corpus is a first-class deliverable.

Golden-file / snapshot tests per tool, version, and scenario (insta)
Cross-tool conformance — all 9 §8.2 scenarios (happy path, rejected alternative, ban, interleaved arcs, multi-edit, tool failure, rewind/compaction, subagent, no-marker) must normalize to the same shape regardless of tool
Property tests (proptest): determinism, idempotency, monotonic seq, losslessness, gate purity, offset resumption
Fuzzing (cargo-fuzz): one target per adapter — never panic, never loop, skip-and-continue
Redaction & privacy, crash/resume, and a cross-version corpus

cargo test --workspace --all-features          # 260 tests
cargo clippy --workspace --all-targets --all-features -- -D warnings
cargo deny check

CLI

Command	What it does
`memscribe watch [--tools …] [--sink …] [--out …]`	The steady-state capture daemon: tail transcripts, write nodes.
`memscribe parse <file> [--as <tool>]`	One-shot parse a transcript to NDJSON (the workhorse for tests/debugging).
`memscribe replay <file>`	Re-run preparation over a historical session.
`memscribe verify [--capture <file> --as <tool>]`	Run the conformance summary; `--capture` snapshots a live session into a fixture.
`memscribe redact <file> [--no-content]`	Show what the redaction pass would strip.
`memscribe hook`	The hook handler agents invoke (reads stdin, records, returns immediately).

Configure per-tool path overrides, a custom commitment-marker table, redaction patterns, and the sink target in memscribe.toml — see memscribe.example.toml.

Workspace layout

Crate	Responsibility
`memscribe-core`	The contract: model, traits, gate, segmenter, binder, node-prep, redaction. Depends on nothing in the workspace.
`memscribe-adapters`	The 9 per-tool parsers, behind feature flags.
`memscribe-io`	Sources: file reader, crash-safe offset tailer, live notify watcher, hook handler, OTLP receiver.
`memscribe-sink`	NDJSON, SQLite, and the feature-gated MemDB sink.
`memscribe-cli`	The `memscribe` binary.
`memscribe-testkit`	Golden harness, conformance suite, synthetic generators, invariant checks.

See ARCHITECTURE.md for the deep dive and CONTRIBUTING.md to add an adapter.

Requirements

Rust ≥ 1.96 (pinned in rust-toolchain.toml)
Git — for repo/branch binding on episodes
No network in the core path; the optional OTLP receiver binds to loopback only.

License

Dual-licensed under either of MIT or Apache-2.0, at your option.

_{Built by Memrack / Syncable · the deterministic half of the memory system.}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
assets		assets
crates		crates
fixtures		fixtures
fuzz		fuzz
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
SECURITY.md		SECURITY.md
deny.toml		deny.toml
memscribe.example.toml		memscribe.example.toml
rust-toolchain.toml		rust-toolchain.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Memscribe

What it is

Where Memscribe sits

The pipeline

Quick start

Supported tools

Usable with MemDB — and fully usable without it

The output contract

Why deterministic matters

Testing

CLI

Workspace layout

Requirements

License

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Memscribe

What it is

Where Memscribe sits

The pipeline

Quick start

Supported tools

Usable with MemDB — and fully usable without it

The output contract

Why deterministic matters

Testing

CLI

Workspace layout

Requirements

License

About

Resources

License

Licenses found

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages