Skip to content

fix(knowledge): classifier silently no-op'd every article (echoed-template + brittle parse)#209

Merged
mkreyman merged 1 commit into
masterfrom
fix-classifier-json-parse
Jun 29, 2026
Merged

fix(knowledge): classifier silently no-op'd every article (echoed-template + brittle parse)#209
mkreyman merged 1 commit into
masterfrom
fix-classifier-json-parse

Conversation

@mkreyman

Copy link
Copy Markdown
Owner

The bug

Dry-running the reclassification classifier against real prod articles before committing the 77k migration revealed every call returned {:error, :unparseable_classification}. The worker treats an error as a no-op, so a blind commit run would have 'completed' having changed nothing — the exact 'does nothing at all' failure mode.

Mocked unit tests passed; only a real-data dry-run caught it.

Causes + fixes

  1. The system prompt contained a literal JSON template {"category": "<one of: ...>"}. Haiku echoed it verbatim, and <one of: ...> isn't valid JSON. Reworded to describe the two keys without an echo-able brace template.
  2. Parsing required the entire trimmed response to be pure JSON. Replaced with parse_text/1 — a pure, unit-testable function that extracts the first {...} object (tolerating markdown fences / preamble / trailing prose) and validates an active category + numeric confidence.

Tests

parse_text/1: clean JSON, ```json fence, prose-wrapped, integer confidence, the echoed-template failure mode, retired/unknown category rejection, non-numeric confidence, non-JSON/nil. Full local gate green (3004 tests, credo, dialyzer 0).

…emplate + brittle parse)

Found by dry-running the reclassification classifier against real prod articles
BEFORE committing the 77k migration: every call returned
{:error, :unparseable_classification}, which the worker treats as a no-op -- so a
blind commit run would have 'completed' having changed nothing.

Two causes, both fixed:
- The prompt contained a literal JSON template '{"category": "<one of: ...>"}'.
  Haiku echoed it verbatim, and '<one of: ...>' is not valid JSON. Reworded the
  prompt to describe the two keys without an echo-able brace template.
- parse_response required the ENTIRE trimmed text to be pure JSON. Replaced with
  parse_text/1, a pure (unit-testable) function that extracts the first {...}
  object, tolerating markdown fences / preamble / trailing prose, then validates
  the category is active and confidence is numeric.

Tests: parse_text/1 covers clean JSON, fenced JSON, prose-wrapped, integer
confidence, the echoed-template failure mode, retired/unknown category rejection,
non-numeric confidence, and non-JSON/nil. Full local gate green.
@mkreyman mkreyman merged commit f5f86c7 into master Jun 29, 2026
9 checks passed
@mkreyman mkreyman deleted the fix-classifier-json-parse branch June 29, 2026 23:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant