feat(knowledge): reclassify run hardening — concurrency, configurable model, outage snooze#211
Merged
Merged
Conversation
…ifier model Tuning the reclassification engine for the real 77k run: - process_batch now classifies a batch CONCURRENTLY via Task.async_stream (max_concurrency :knowledge_reclassify_max_concurrency, default 10; 30s per-call timeout, on_timeout: :kill_task -> counted as a processed error and left for a later idempotent run). Writes stay SERIAL on the worker process so the small BYPASSRLS admin pool is never hit by N concurrent updates. At ~10x the per-batch throughput this turns a ~16h sequential 77k pass into ~1-2h. - The classifier model is now independently configurable via :knowledge_classifier_model (ANTHROPIC_CLASSIFIER_MODEL in prod), falling back to the shared :anthropic_provider model (Haiku 4.5), so a one-time high-accuracy reclassification can use a stronger model WITHOUT changing content extraction. Mox's $callers mechanism covers the Task.async_stream-spawned classify calls, so the existing reclassify tests pass unchanged. Full local gate green.
… (no skipped articles)
Before: if the classifier upstream (Anthropic, or the host's egress) was
unreachable, every classify in a batch errored, the batch still 'succeeded', the
cursor ADVANCED, and those articles were silently SKIPPED — a permanent gap until
a full re-run.
Now: when a non-empty batch comes back with an error rate >= the configured
threshold (:knowledge_reclassify_snooze_error_rate, default 0.5), perform returns
{:snooze, seconds} instead of advancing. Oban re-runs the SAME job (same cursor)
later WITHOUT consuming a max_attempts slot and WITHOUT logging an audit event, so
an outage simply PAUSES the migration and it resumes cleanly when connectivity
returns — nothing skipped, no audit spam. A few genuine per-article errors (below
the rate) still proceed and are left unchanged (recoverable by an idempotent
re-run).
Note: the migration executes on the Fly host, so an outage at the operator's
location never touches it once kicked; this covers the Anthropic/egress-side case.
Tests: a 100%-error batch snoozes and writes/audits nothing; a 1-of-3 error batch
(below the rate) proceeds, reclassifying the two good articles and leaving the
errored one alone. Full local gate green.
… playbook-lean A prod sample dry-run showed the classifier pulling principles/essays into 'playbook' (9 of 15 sampled proposals). Tightened the discriminators: - Categories definitions (shared by the classifier AND extractor prompts): playbook now requires EXPLICIT, ORDERED STEPS (not a tip/principle/shape); pattern is the recurring shape vs a procedure; insight is the 'why'/principle not a how-to; idea is a thing to build or try. - Classifier prompt gains a most-specific-fit tie-breaker that explicitly steers principles -> insight, shapes -> pattern, build/try -> idea, and tells it not to over-use playbook. To be re-validated against the prod sample before the full 77k commit.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Tuning/hardening the reclassification engine for the real 77k run (found via prod verification).
1. Concurrent batch classification
process_batchnow classifies a batch concurrently viaTask.async_stream(:knowledge_reclassify_max_concurrency, default 10; 30s per-call timeout,on_timeout: :kill_task). Writes stay serial on the worker process so the small BYPASSRLS admin pool is never hit by N concurrent updates. Turns a ~16h sequential 77k pass into ~1–2h.2. Configurable classifier model
:knowledge_classifier_model(ANTHROPIC_CLASSIFIER_MODELin prod) overrides the classifier model independently of content extraction — so a one-time high-accuracy reclassification can use a stronger model (e.g. Sonnet) without changing extraction. Falls back to the shared provider model (Haiku 4.5).3. Outage resilience (snooze, never skip)
If a batch comes back with an error rate ≥
:knowledge_reclassify_snooze_error_rate(default 0.5) — i.e. the classifier upstream is unreachable —performreturns{:snooze, seconds}instead of advancing. Oban re-runs the same cursor later without consuming an attempt or logging audit, so an outage pauses the migration and it resumes cleanly — no skipped articles. (The migration runs on the Fly host, so a local operator outage never touches it; this covers the Anthropic/egress side.)Tests
Concurrent path (Mox
$callerscovers the spawned tasks); 100%-error batch snoozes and writes/audits nothing; 1-of-3 error batch (below the rate) proceeds and reclassifies the good ones. Full local gate green (3007 tests, credo, dialyzer 0).