feat(knowledge): expand category taxonomy + single source of truth (P3 stage 1) by mkreyman · Pull Request #206 · mkreyman/loopctl

mkreyman · 2026-06-29T21:44:17Z

What

P3 stage 1 of the second-brain taxonomy work: expand the category set and centralize it as a single source of truth.

The corpus only had 5 categories (pattern/convention/decision/finding/reference). This adds the richer taxonomy you approved and — just as importantly — kills the 6-way duplication of the category list that was silently drifting.

Taxonomy

New module Loopctl.Knowledge.Categories is canonical:

active/0 — pattern, decision, finding, reference, playbook, insight, entity, idea, quote, question (the 4 kept + 6 new). What new content is classified as.
retired/0 — convention. Still DB-valid (the category column is a plain string and ~13% of the corpus is convention; dropping the enum value before the reclassification backfill would make those rows fail to load), but not offered to new content.
definition/1 + prompt_fragment/0 — per-category definitions injected into the LLM extraction prompts so the model classifies into the full taxonomy correctly (e.g. playbook = the steps vs pattern = the shape).

Wired through (drift eliminated)

Before (own copy of the list)	After
`Article.@category_values`	`Categories.all()` (Ecto.Enum can't drift)
`ClaudeContentExtractor` prompt + `@valid_categories`	prompt rebuilt from `Categories` at compile time; gate → `Categories.all_strings()`
`LlmExtractor` prompt + `@valid_categories`	same
`ReviewKnowledgeWorker.@valid_categories`	`Categories.all()`
`ContentIngestionWorker.@valid_categories`	`Categories.all()`

scale_seed.ex is intentionally left alone — it's synthetic scale-test data whose category round-robin feeds index-selectivity gates; changing its list length is orthogonal to the product taxonomy and would risk scale-gate flakiness.

No DB migration: category is a :string column, so new enum values need no schema change.

Tests

New categories_test.exs: composition (all = active ++ retired, disjoint), the expanded active set, string helpers, definitions cover every active category, prompt fragment includes all active / excludes retired, and an enum-vs-canonical drift guard (Ecto.Enum.values(Article, :category) == Categories.all()).
article_test.exs: changeset accepts every new category.
knowledge_test.exs + knowledge_stats_controller_test.exs: the dense by_category stats maps grew from 5→11 keys (correct — stats derives the dense map from the enum). Updated the expectations to derive from Categories rather than re-hardcode, so a future taxonomy change can't silently break them again.

Full local gate green: format, credo --strict, dialyzer (0 errors), 2987 tests / 0 failures.

Next P3 stages (separate PRs)

CategoryClassifier behaviour + config-DI (Claude-backed) + Mox.
Resumable, cost-ceilinged, dry-run-first backfill worker over the 77k corpus: write-on-confident-change, always reclassify convention; emits a proposed-distribution report before committing.
Cross-repo: synth.py / publish.py category enum (claude-config) + user CLAUDE.md category list.
After the backfill drains convention to 0, a follow-up drops it from active/retired.

…ce of truth P3 stage 1 of the second-brain taxonomy work. Adds the richer category set and, crucially, ends the 6-way duplication of the category list that was silently drifting (Article schema, both LLM extractor prompts + their @valid_categories guards, and two ingestion workers each carried their own copy). New module Loopctl.Knowledge.Categories is the canonical source: - active/0 -> pattern, decision, finding, reference, playbook, insight, entity, idea, quote, question (the 4 kept + 6 new). What new content is classified as. - retired/0 -> convention. Still DB-valid (the category column is a plain string and ~13% of the corpus is convention; dropping the enum value before the 77k reclassification backfill would make those rows fail to load), but not offered to new content. - definition/1 + prompt_fragment/0 -> per-category definitions injected into the extraction prompts so the model classifies into the full taxonomy correctly (e.g. playbook = the steps vs pattern = the shape). Wired through: - Article.@category_values -> Categories.all() (Ecto.Enum now can't drift) - ClaudeContentExtractor / LlmExtractor: prompt rebuilt from Categories at compile time; @valid_categories -> Categories.all_strings() (lenient gate; prompt only offers active categories) - ReviewKnowledgeWorker / ContentIngestionWorker: @valid_categories -> Categories.all() scale_seed.ex is intentionally left alone: it is synthetic scale-test data whose category round-robin feeds index-selectivity gates; changing its list length is orthogonal to the product taxonomy and risks scale-gate flakiness. No DB migration: category is a :string column, so new enum values need no schema change. Tests: Categories unit (composition, string helpers, definitions, prompt fragment, and an enum-vs-canonical drift guard via Ecto.Enum.values/2) + Article changeset accepts every new category. Full local gate green (format, credo --strict, dialyzer 0 errors). Next P3 stages (separate PRs): CategoryClassifier behaviour + DI; resumable, cost-ceilinged, dry-run-first backfill worker over the 77k corpus (write-on-confident-change, always reclassify convention); cross-repo synth.py / publish.py + user CLAUDE.md category list.

mkreyman merged commit 41fdcbd into master Jun 29, 2026
9 checks passed

mkreyman deleted the knowledge-taxonomy-expansion branch June 29, 2026 21:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(knowledge): expand category taxonomy + single source of truth (P3 stage 1)#206

feat(knowledge): expand category taxonomy + single source of truth (P3 stage 1)#206
mkreyman merged 1 commit into
masterfrom
knowledge-taxonomy-expansion

mkreyman commented Jun 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mkreyman commented Jun 29, 2026

What

Taxonomy

Wired through (drift eliminated)

Tests

Next P3 stages (separate PRs)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant