v0.11.0: hook hardening, commit guard, consistency guards, comment lens, routing eval#5
Merged
Conversation
The meridian routing skill ended with leftover </content></invoke> tags that were injected into context whenever the skill loaded.
- Resolver exits cleanly when neither CLAUDE_PLUGIN_ROOT nor PLUGIN_ROOT is
set, instead of spawning 'undefined/hooks/*.mjs' and crashing the session.
- touch() and clear() wrap their fs calls so a permission/disk error degrades
to a no-op rather than crashing the hook.
- Failure-signal matching collapses internal whitespace, so an accidental
double-space ('still broken') still reroutes to debug.
- Tests cover the unset-env exit, the WSL /mnt fallback at runtime, and the
double-spaced signal.
SessionStart fires with source "compact" after auto/manual compaction and is the only compaction-time event that can inject context (PostCompact output is ignored). The empty matcher already routes every source through session-start, so orientation is restored when compaction drops it. Pin that behavior: a test that the hook re-emits on a compact-source payload, and a guard that the SessionStart matcher stays empty.
Adds a Claude-only PreToolUse hook (matcher Bash) that denies, with feedback to the model, two things the output style only asked for in prose: - git commits carrying AI attribution (Co-Authored-By: Claude, 'Generated with Claude', claude.ai/code, or a Claude-Session trailer) - staging .meridian/ working artifacts (git add/stage of a .meridian path) Everything else defers to normal permissions. Registered in hooks.json only; Cursor and Copilot have no PreToolUse event.
Strengthens the review enforcement layer that durably counters comment slop. Judges comments added in the diff by whether they capture non-obvious why, and names the dominant first-draft failure mode — chain-of-thought narrated as comments — alongside self-evident-signature restatement and oversized blocks.
The Claude and Cursor manifests had drifted (0.10.9 vs 0.10.8) because 'claude plugin validate' only inspects the Claude manifest. Set all three per-host manifests to 0.11.0 and add a test asserting they agree. The test runner now globs test/*.test.mjs, so new suites are picked up without editing package.json or CI.
Two checks against the drift the routing tables, dispatches, and HARD-GATE duplication invite: every meridian:<x> reference across skills, agents, hooks context, output style, and README must resolve to a real skill or agent, and each skill's frontmatter name must match its directory.
The skills table omitted sketch (a core workflow) and gave no pointer to the meta/lens/modifier skills (meridian, triangulate, auto), leaving four of the shipped skills undiscoverable from the README.
Records releases in Keep a Changelog format instead of relying on commit subjects as the de-facto changelog.
promptfoo-based eval that checks prompts route to the correct skill against the real plugin, across Opus/Sonnet/Haiku. Uses the anthropic:claude-agent-sdk provider with skill-used assertions and a local plugin load; corpus under eval/scenarios covers one positive per routable skill, trivial negatives, and the failure-signal reroute. Run on demand via 'pnpm eval' (needs ANTHROPIC_API_KEY); an optional workflow_dispatch job runs it in CI. Not part of the offline gates.
max_turns:1 made the agent-sdk provider error before returning a result with skillCalls; 6 lets each scenario conclude (0 errors). Run on Sonnet only — the routing baseline. The reroute hook fires through the provider (a probe routed 'still broken' to meridian:debug).
Rewrite the research and execute scenarios so the route is determined by the prompt alone (verify-before-coding against an external API; locked requirements ready to implement) — both now route correctly. Remove the cold single-turn reroute scenarios: a first-message 'still broken' has no prior fix to debug, so the route is only meaningful mid-flow (deferred); the hook firing is covered by the unit tests.
The PreToolUse guard matched .meridian anywhere in a git add argument, so a file named e.g. data.meridian-export.json was wrongly denied. Require a path boundary on both sides so only the .meridian directory matches.
…artifact The routing eval workflow writes the pass/fail matrix to the GitHub job summary and uploads the promptfoo HTML report as an artifact. Scenario failures no longer fail the run (continue-on-error) — routing is non-deterministic and the job is informational, not a gate. Still workflow_dispatch only, no schedule.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
v0.11.0
Hooks
hooks.jsonresolver exits cleanly when neitherCLAUDE_PLUGIN_ROOTnorPLUGIN_ROOTis set, instead of spawningundefined/hooks/*.mjsand crashing the session.touch()andclear()wrap their filesystem calls so an I/O error degrades to a no-op. The failure-signal match collapses internal whitespace, so an accidental double-space ("still broken") still reroutes todebug.PreToolUseguard (hooks/pre-tool-use.mjs, matcherBash) deniesgitcommits carrying AI attribution (Co-Authored-By: Claude, "Generated with Claude",claude.ai/code, or aClaude-Sessiontrailer) and denies staging the gitignored.meridian/artifacts. Everything else defers to normal permissions.SessionStartfires with sourcecompact, and the empty matcher must keep catching it.Review & consistency
0.11.0), everymeridian:<skill>/meridian:<agent>reference must resolve to something that exists, and each skill's frontmatter name must match its directory. The test runner discoverstest/*.test.mjsby glob.Docs
sketchworkflow and the composingmeridian/triangulate/autoskills. ACHANGELOGstarts at0.11.0.Skill-routing eval (dev tooling)
eval/adds a promptfoo-based routing eval (theanthropic:claude-agent-sdkprovider withskill-usedassertions) that checks prompts route to the correct skill against the real plugin on Sonnet. Run on demand withpnpm eval; an optionalworkflow_dispatchjob runs it in CI. promptfoo and the agent SDK are devDependencies — the plugin still ships no runtime dependencies. Not part of the offline CI gates.