feat(knowledge): nightly KnowledgeLintWorker — scheduled lint + orphan self-heal#204
Merged
Conversation
…als orphans The knowledge-wiki lint engine (Knowledge.lint/2) already existed but was only ever invoked on-demand via the API. Add the scheduled "nightly refinement" loop that both the Karpathy llm-wiki pattern and the Dan Martell second-brain workflow converge on: run lint on a cron and take a safe automated action on the one finding that can be auto-repaired. - KnowledgeLintWorker (:knowledge queue) with an all_tenants fan-out mirroring ComputeSthWorker -- lint is per-tenant, so each tenant lints in its own job. - Per tenant: run Knowledge.lint/2, then re-enqueue ArticleLinkingWorker for each orphan (published article with zero links). Re-linking re-runs the proven pgvector similarity pass against the CURRENT corpus, so an article orphaned in January can find neighbors ingested months later. Deterministic, no embedding- API cost. - Emit an immutable knowledge.lint_completed audit event carrying the full lint summary so contradictions / coverage gaps / broken sources / stale counts are observable in the change feed (those need human judgment, so they are surfaced not auto-repaired). - Scale-bounded: orphan re-link enqueues capped by :knowledge_lint_max_orphan_relink (default 500); when the true orphan count exceeds the cap the gap is logged, never silently dropped. - Scheduled nightly at 04:00 (after the 02:00-03:00 maintenance window). Tests cover per-tenant lint + audit, orphan re-linking outcome, empty-tenant, all_tenants fan-out (skips suspended tenants), and tenant isolation.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds
Loopctl.Workers.KnowledgeLintWorker— the scheduled "nightly refinement" loop for the knowledge wiki. The lint engine (Knowledge.lint/2) already existed but was only ever invoked on-demand via the API. This wires it onto a cron and takes a safe, automated repair action on the one finding that can be auto-fixed: orphans.Both the Karpathy
llm-wikipattern (lint-and-act) and the Dan Martell second-brain workflow (nightly Claude refinement) independently converge on this loop. The platform had the engine; it lacked the orchestration.Why now
The second-brain corpus is at ~77k articles. Lint runs clean except for ~1.9% orphan articles (zero inbound/outbound links). Nothing was scheduled to act on them, and entropy (orphans, drift) is invisible under brute-force search.
How it works
:knowledgequeue,max_attempts: 3, unique on{worker, args}within 60s.all_tenantsfan-out mirroringComputeSthWorker— lint is inherently per-tenant, so each active tenant lints in its own job (independent retries, no cross-tenant coupling). Suspended/inactive tenants are skipped.Knowledge.lint/2, then re-enqueue the provenArticleLinkingWorkerfor each orphan. Re-linking re-runs the pgvector similarity pass against the current corpus — so an article orphaned in January can find neighbors ingested months later. Deterministic, and makes no embedding-API calls (orphans missing an embedding simply no-op in the linking worker; backfilling those is a separate concern).knowledge.lint_completedaudit event carrying the full lint summary. Contradictions / coverage gaps / broken sources / stale counts need human judgment, so they are made observable in the change feed rather than auto-repaired.:knowledge_lint_max_orphan_relink(default 500). When the true orphan count exceeds the cap, the gap is logged, never silently dropped — the remainder retries next run.Tests
test/loopctl/workers/knowledge_lint_worker_test.exs(async, 5 cases):knowledge.lint_completedaudit event with the summaryrelates_tolink is created;orphans_relinked == 2)all_tenantsfan-out lints both active tenants and skips a suspended oneVerification
Full pre-commit gate green locally: compile (
--warnings-as-errors),mix format,credo --strict,dialyzer(0 errors), and the full suite (2977 tests, 0 failures).Notes / follow-ups (not in this PR)
ArticleLinkedges. Tracked separately.