Skip to content

feat(campaign): add typed policy edit loop#286

Open
drewstone wants to merge 2 commits into
mainfrom
feat/policy-edit-loop
Open

feat(campaign): add typed policy edit loop#286
drewstone wants to merge 2 commits into
mainfrom
feat/policy-edit-loop

Conversation

@drewstone

Copy link
Copy Markdown
Contributor

Summary

  • add a PolicyEdit contract for typed analyst-proposed edits with deterministic IDs, expected gain, risk, source evidence, and optional canonical AgentProfileCell target
  • add policyEditProposer so campaigns can turn admitted typed edits or compatible analyst findings into candidate surfaces
  • add focused tests for strong/weak edit calibration, judge-derived leakage rejection, profile-cell targets, JSON/text application, and campaign proposer behavior
  • exclude active .claude/worktrees/** from Vitest discovery so normal test runs do not execute duplicate worktree copies

Checks

  • pnpm exec biome check --write src/analyst/policy-edit.ts src/campaign/proposers/policy-edit.ts src/analyst/index.ts src/campaign/index.ts src/index.ts tests/policy-edit.test.ts tests/campaign/policy-edit-proposer.test.ts
  • pnpm vitest run tests/policy-edit.test.ts tests/campaign/policy-edit-proposer.test.ts
  • pnpm typecheck
  • pnpm lint
  • pnpm test
  • NODE_OPTIONS=--max-old-space-size=8192 pnpm build
  • pnpm verify:package
  • git merge-tree --write-tree origin/main HEAD

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Value Audit — sound

Verdict sound
Concerns 0 (none)
Heuristic 0.0s
Duplication 0.0s
Interrogation 108.3s (2 bridge agents)
Total 108.3s

💰 Value — sound

Adds a schema-versioned, hash-identified PolicyEdit contract + admission gate + SurfaceProposer so analyst findings become typed, measurable candidate surfaces; plugs cleanly into the existing firewall/grammar/profile-cell primitives with no real duplication.

  • What it does: Introduces a typed PolicyEdit envelope (src/analyst/policy-edit.ts:86-100) with deterministic sha256 IDs (computePolicyEditId, line 155), full validate-round-trip (validatePolicyEdit, line 162), and a fail-loud admission gate (admitPolicyEdit, line 271) that scores edits on evidence/confidence/expected-gain/target-specificity and rejects judge-derived findings, evidence-less edits, sub-threshold
  • Goals it achieves: (1) Make analyst-proposed edits auditable and hash-stable — every edit carries schema_version + editId + source findings + expected-gain contract, so the improvement loop can diff and attribute candidates. (2) Fail-loud pre-admission gate so unmeasurable edits (no expected gain), evidence-less guesses, judge-leak findings, and high-risk changes never become candidates — the loop measures only edit
  • Assessment: Built firmly in the codebase's grain. It mirrors AgentProfileCell's shape exactly (schemaVersion + sha256 id + validate round-trip + ValidationError subclass). It uses the existing steer firewall (assertNoJudgeVerdict, src/analyst/steer-firewall.ts:69) at both the fromFinding and fromFindings entry points, the existing parseFindingSubject grammar for routing, the existing AgentProfileCell
  • Better / existing approach: Looked for overlap with (a) SkillPatch (src/campaign/skill-patch.ts:55) — same anchor-replace idea but on skill-doc surfaces with op-bundles, not prompts/runtime-config; the ~15-line text-replace overlap is sub-scale and the surfaces are genuinely different siblings. (b) analysisEditProposer (src/campaign/proposers/analysis-edit.ts:43) — applies findings via ONE LLM call; PolicyEdit is its determi
  • Model: opencode/zai-coding-plan/glm-5.2
  • Bridge attempts: 2
  • Bridge warning: opencode/kimi-for-coding/k2p7: bridge stream ended without value-audit content

🎯 Usefulness — sound

A typed, content-addressed, admission-gated PolicyEdit contract plus a policyEditProposer that conforms to the existing SurfaceProposer pattern and reuses the canonical FindingSubject grammar, AgentProfileCell, and steer-firewall primitives — net-new capability, no dead surface, no competing pattern

  • Assessment: Net-new, coherent capability that fills a real gap: the codebase had LLM-rewrite proposers and best-effort text curators but no typed, validated, content-addressed, admission-gated edit envelope an analyst could emit and the held-out loop could measure. Will be used — it is a new option in the existing proposer menu, drivable by the same selfImprove/runOptimization entry points, with a clear first
  • Integration: Fully wired and reachable. Exported from src/campaign/index.ts:160-163 and src/index.ts:93-122. Consumed through the standard runOptimization loop (src/campaign/presets/run-optimization.ts:159-170) which calls proposer.propose({ findings }) like it does for every other proposer; no new loop wiring required. The proposer reads ctx.findings and accepts both pre-typed PolicyEdit objects and legacy An
  • Fit with existing patterns: Built in the grain of the codebase. Routes via the existing typed FindingSubject grammar (parseFindingSubject) rather than inventing a new locus scheme — exactly the fix finding-subject.ts documents as necessary. Reuses AgentProfileCell as the canonical target identity, assertNoJudgeVerdict (steer-firewall.ts:69) plus a defense-in-depth derivedFromJudge admission check (policy-edit.ts:281-283), an
  • Real-world viability: Holds up past the happy path. Validation recomputes the content-addressed ID and rejects mismatches (policy-edit.ts:188-191), enforces enum/range/JSON-compat invariants, idempotent text append/prepend via includes guard (policy-edit.ts:472,476), sameSurface no-op skip in the proposer (policy-edit.ts:53). Steer firewall at two layers. No-silent-fallback doctrine honored: findings lacking typed expe
  • Model: opencode/zai-coding-plan/glm-5.2
  • Bridge attempts: 1

No concerns — sound change, no better or existing approach found. ✅


What this audit checks

It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.

Pass What it asks
Heuristic Vague title? Whitespace-only or cruft-bearing diff? (content signals only)
Duplication Do added function/class names already exist elsewhere in the repo?
Value Audit What does it do? What goal does it achieve? Is it good? Better architecture or already-exists?
Usefulness Audit Does it integrate and fit? Will it hold up in real use and actually get used?

Findings are concerns, not blocks — the human reviewer decides what to do with them.

value-audit · 20260628T193619Z

@tangletools

Copy link
Copy Markdown
Contributor

✅ No Blockers — a99e2245

Review health 100/100 · Reviewer score 32/100 · Confidence 90/100 · 14 findings (3 medium, 11 low)

deepseek: Correctness 32 · Security 32 · Testing 32 · Architecture 32

Reviewer score is advisory once the run is complete and the verdict has no blockers.

Full multi-shot audit completed 6/6 planned shots over 8 changed files. Global verifier still owns final merge decision.

🟠 MEDIUM normalizeChange saves untrimmed find, breaking content-hash integrity — src/analyst/policy-edit.ts

Line 594: if (change.find?.trim()) out.find = change.find — saves the ORIGINAL untrimmed find string, but value is trimmed on line 592. If change.find is ' target ', the hash incorporates the whitespace-padded string while the untrimmed version is the one used at lines 479-483 (expectNonEmpty + surface.replace). Two semantically-identical edits with different find whitespace produce different edi

🟠 MEDIUM setJsonAtPath creates empty intermediate objects on remove, corrupting data — src/analyst/policy-edit.ts

Lines 546-553: The path traversal loop unconditionally creates {} for non-existent or non-object intermediate keys (next = {} when existing is falsy or an array). This runs BEFORE the mode is checked (line 555). For remove mode on a non-existent path like a.b.c where a doesn't exist: creates {a:{}}, then {a:{b:{}}}, then deletes c from {}, returning {a:{b:{}}} — a phantom empty structure. For remove on a path through an array like arr.0 where arr = [1,2,3]: the array is replaced with {} ([line 550](https://githu

🟠 MEDIUM Legacy-finding test doesn't verify routing correctness (false positive risk) — tests/campaign/policy-edit-proposer.test.ts

Test 'materializes legacy AnalystFinding rows only when they carry typed expected gain' (line 52) constructs a finding with subject 'system-prompt:tool-use' but only asserts String(out[0]!.surface).toContain(...) — it never checks the candidate label, axis, or target path. If routeFindingSubject or routeParsedSubject breaks (wrong axis, wrong target surface, wrong path), this test still passes because any text-append change to any surface would include the recommended_action text. Fix: add assertions on out[0]!.label (expect 'policy-edit:representation') and verify the surface is a string containing the path-derived context.

🟡 LOW Tests hang — cannot verify test coverage — src/analyst/policy-edit.ts

Both tests/policy-edit.test.ts and tests/campaign/policy-edit-proposer.test.ts hang under vitest 3.2.4 in this environment (tried default pool, forks pool, verbose reporter — all hang after 'RUN' line with no test output; other package tests resolve). The async buildAgentProfileCell call in the 'validates canonical AgentProfileCell' test may deadlock on a crypto primitive. Could not confirm pass/fail. The test file itself covers happy-path admission, judge-verdict rejection, text/JSON apply, and legacy-finding materialization — but has zero coverage for setJsonAtPath remove mode, merge mode, deep paths, or whitespace-edge find normalization.

🟡 LOW makePolicyEdit normalizes source twice — src/analyst/policy-edit.ts

Line 146: source: normalizeSource(init.source) — pre-normalizes source before passing to normalizePolicyEdit, which also calls normalizeSource(input.source) on line 570. Idempotent (trim on trimmed strings, Set→sort on sorted), so no correctness impact, but indicative of redundant control flow. Remove either the line 146 inline call or the [line 570](https://github.com/tangle-network/agent-eval/blob/a99e22454cb7c

🟡 LOW scorePolicyEditReadiness validates twice inside admitPolicyEdit — src/analyst/policy-edit.ts

Line 275 validates the edit; line 276 calls scorePolicyEditReadiness(validated, opts) which re-validates on line 252. The second validation computes the content hash again (non-trivial: canonicalize + SHA-256). This is a pure waste — scorePolicyEditReadiness should accept an already-validated edit or the call site should skip pre-validation. No correctness impact since validation is idempotent.

🟡 LOW onAdmission callback may not fire for all edits when candidate limit reached — src/campaign/proposers/policy-edit.ts

At line 63, if (out.length >= limit) break exits the for-loop once the candidate cap is hit. Any remaining edits in the edits array are never passed through admitPolicyEdit, so onAdmission (line 49) never fires for them. The callback is documented as for 'audit UIs/tests that need rejected edit reasons' — but unprocessed edits are silently skipped, not rejected. An auditor expecting to see decisions for ALL edits would miss the remainder. Fix: either document this behavior explicitly in the callback docs, or move t

🟡 LOW No boundary test for empty ctx.findings — tests/campaign/policy-edit-proposer.test.ts

No test passes ctx([]) — zero edits. While the implementation handles this cleanly (empty loop, returns []), the boundary is untested. A future change that adds a guard or early-return with a default candidate could introduce a behavior change here unnoticed.

🟡 LOW No test for maxCandidates capping — tests/campaign/policy-edit-proposer.test.ts

policyEditProposer caps output at opts.maxCandidates ?? ctx.populationSize. When multiple admitted edits exceed this limit, later edits are silently dropped. No test covers this capping behavior — an off-by-one or overflow in the limit computation (line 41-44) would go undetected.

🟡 LOW No test for onAdmission callback with 'admit' decision — tests/campaign/policy-edit-proposer.test.ts

Test 3 (line 78) verifies onAdmission fires with 'reject' for weak edits, but no test verifies the callback fires with 'admit' when an edit passes admission. The callback contract (PolicyEditProposerOptions.onAdmission) promises both decisions. Absence risks: a future change could break the callback for admitted edits without any test catching it.

🟡 LOW No error-path test for validatePolicyEdit — tests/policy-edit.test.ts

All 6 tests exercise only valid inputs. validatePolicyEdit has 15+ distinct throw sites (bad schemaVersion, malformed editId, invalid axis, bad surface, missing required fields, non-finite confidence, mismatched content hash, etc.). None of the throw paths are covered. An accidental relaxation of validation would go undetected.

🟡 LOW Untested JSON change modes: merge and remove — tests/policy-edit.test.ts

Line 154-156 only tests mode: 'set'. The merge mode (which conditionally spreads objects) and remove mode are untested. The merge mode has a branch where prior is a non-object → falls through to set, and a branch where it spreads — both untested.

🟡 LOW Untested policyEditFromFinding with missing recommended_action — tests/policy-edit.test.ts

Line 222 of src/analyst/policy-edit.ts checks finding.recommended_action?.trim() — if absent/empty it returns null. No test exercises this branch. The test only covers the metadata-absent path.

🟡 LOW Untested text change modes: prepend and replace — tests/policy-edit.test.ts

Line 134-137 only tests applyPolicyEditToSurface with mode: 'append'. The prepend mode and replace mode (including the find-not-found error path) have zero test coverage. A regression in applyTextChange for these modes would go undetected.


tangletools · 2026-06-28T19:47:35Z · trace

tangletools
tangletools previously approved these changes Jun 28, 2026

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Approved — 14 non-blocking findings — a99e2245

Full multi-shot audit completed 6/6 planned shots over 8 changed files. Global verifier still owns final merge decision.

Full immutable report for this review: trace

Summary comment for this run: full summary


tangletools · 2026-06-28T19:47:35Z · immutable trace

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Refreshed approval after new commits — abdd9f59

A previous trusted approval on this PR was invalidated by new commits.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: stale_approval_refresh · 2026-06-28T19:53:13Z

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Value Audit — sound

Verdict sound
Concerns 1 (1 weak-concern)
Heuristic 0.0s
Duplication 0.0s
Interrogation 115.5s (2 bridge agents)
Total 115.5s

💰 Value — sound

Adds a typed, hash-addressed PolicyEdit contract + deterministic SurfaceProposer that turns analyst findings into measured candidates without an LLM apply step — reuses every existing primitive (AgentProfileCell, canonicalize, parseFindingSubject, assertNoJudgeVerdict) and conforms exactly to the Su

  • What it does: Introduces a PolicyEdit envelope (src/analyst/policy-edit.ts, 798 lines) — a schema-versioned, sha256-content-addressed contract for analyst-proposed edits. Each edit carries an axis (representation/tool_contract/budget/...), a target (surface + path + optional canonical AgentProfileCell), a typed change (text append/prepend/replace or JSON set/merge/remove at a dotted path), an expected gain,
  • Goals it achieves: Give campaigns a DETERMINISTIC, AUDITABLE path from analyst findings to measured candidate surfaces — the counterpart to the LLM-brokered apply step in analysisEditProposer/traceAnalystProposer/haloProposer. Today those proposers hand findings to an LLM and accept whatever text comes back; this PR lets an analyst emit a typed edit (or a finding carrying typed metadata.policyEdit) and have
  • Assessment: Fits the codebase's grain precisely. Every needed primitive already exists and is reused rather than reinvented: AgentProfileCell + validateAgentProfileCell for deployment identity (src/agent-profile-cell.ts:47-56), canonicalize from pre-registration for the content hash (src/pre-registration.ts:83), parseFindingSubject + the FindingSubject grammar for routing (src/analyst/finding-subjec
  • Better / existing approach: none — this is the right approach. I checked for overlap with two nearby primitives: (1) src/campaign/skill-patch.ts (SkillPatchOp add/delete/replace) — it is line-anchored, single-surface, multi-line-splice with partial-apply semantics for skillOptProposer; PolicyEditChange is substring-anchored with dedup-via-includes for many surfaces plus a JSON-path mode skill-patch has no equivalent
  • Model: opencode/zai-coding-plan/glm-5.2
  • Bridge attempts: 2
  • Bridge warning: opencode/kimi-for-coding/k2p7: bridge stream ended without value-audit content

🎯 Usefulness — sound

Adds a typed, content-addressed PolicyEdit contract + policyEditProposer that fills a real gap (no existing typed finding→edit pipeline exists), plugs into the established SurfaceProposer factory pattern, and respects the judge-firewall invariant.

  • Integration: Cleanly wired into the established surface. policyEditProposer returns a SurfaceProposer (src/campaign/proposers/policy-edit.ts:36-69), the same interface used by every other proposer (gepa, ace, skillOpt, memoryCuration, parameterSweep, fapo, halo, traceAnalyst, evolutionary) — verified against src/campaign/types.ts:286 and the proposer registry in src/campaign/index.ts:130-174. It is e
  • Fit with existing patterns: Fits the codebase grain precisely. It is the deterministic counterpart to the existing LLM-apply path (src/campaign/proposers/analysis-edit.ts:17APPLY_SYSTEM rewrites the prompt via an LLM from findings). The PR's typed-edit alternative lets an analyst emit a content-addressed edit and skip the LLM rewrite. It reuses the established steer firewall (assertNoJudgeVerdict at `src/analyst/pol
  • Real-world viability: Holds up beyond the happy path. Validation runs on every make/admit/apply path (policy-edit.ts:152,275,308); deterministic SHA-256 editId is asserted to match content at validatePolicyEdit:188. Edge cases are handled: missing expected gain drops the finding rather than fabricating one (policyEditFromFinding:225); idempotent append/prepend check surface.includes(change.value) (`applyTextCha
  • Model: opencode/zai-coding-plan/glm-5.2
  • Bridge attempts: 1

💰 Value Audit

🟡 Readiness-score weights and target-specificity bonuses are ungrounded magic numbers [maintenance] ``

scorePolicyEditReadiness (src/analyst/policy-edit.ts:262-268) uses fixed weights 0.30/0.25/0.25/0.20 with risk penalties -0.35/-0.20; targetSpecificityScore (line 730-737) uses 0.4/0.25/0.15/0.2/0.1 bonuses. These are heuristics with no documented empirical basis. The calibration test (tests/policy-edit.test.ts:46-71) only pins strong>0.7 and weak<0.3, which is the real contract; the internal weights could drift silently. Not blocking — defaults are tunable via PolicyEditAdmissionOptions and the


What this audit checks

It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.

Pass What it asks
Heuristic Vague title? Whitespace-only or cruft-bearing diff? (content signals only)
Duplication Do added function/class names already exist elsewhere in the repo?
Value Audit What does it do? What goal does it achieve? Is it good? Better architecture or already-exists?
Usefulness Audit Does it integrate and fit? Will it hold up in real use and actually get used?

Findings are concerns, not blocks — the human reviewer decides what to do with them.

value-audit · 20260628T195703Z

@tangletools

Copy link
Copy Markdown
Contributor

✅ No Blockers — abdd9f59

Review health 100/100 · Reviewer score 21/100 · Confidence 90/100 · 13 findings (5 medium, 8 low)

deepseek: Correctness 21 · Security 21 · Testing 21 · Architecture 21

Reviewer score is advisory once the run is complete and the verdict has no blockers.

Full multi-shot audit completed 6/6 planned shots over 8 changed files. Global verifier still owns final merge decision.

🟠 MEDIUM Missing test coverage for candidate cap (maxCandidates / populationSize) — src/campaign/proposers/policy-edit.ts

src/campaign/proposers/policy-edit.ts:41-44 — The limit calculation Math.max(0, Math.min(opts.maxCandidates ?? ctx.populationSize, ctx.populationSize)) and the if (out.length >= limit) break at line 63 are untested. If ctx.populationSize is 0 (edge case from a misconfigured optimizer), zero candidates are returned — which may be correct behavior but the silent no-op should be verified. No test case provides maxCandidates explicitly or uses populationSize < length of admissible edits.

🟠 MEDIUM Untested CodeSurface passthrough path in coerceCandidateSurface — src/campaign/proposers/policy-edit.ts

src/campaign/proposers/policy-edit.ts:100-101 — The coerceCandidateSurface function has a branch that passes through CodeSurface objects (checking obj.kind === 'code' && typeof obj.worktreeRef === 'string'). No test exercises this path. A policy edit producing a CodeSurface-shaped return would be unusual (most edits return strings or JSON objects), but the path exists and should be verified — or removed if unreachable, since the applyPolicyEditToSurface in src/analyst/policy-edit.ts always returns string or AgentProfileJson (never CodeSurface). Either document the reachability with a test or remove dead code.

🟠 MEDIUM No test coverage for sameSurface dedup guard in proposer — tests/campaign/policy-edit-proposer.test.ts

src/campaign/proposers/policy-edit.ts:52-53 skips candidates where applyPolicyEditToSurface produces the same surface as ctx.currentSurface. No test exercises this guard. If the guard is broken, no-op edits enter the candidate pool as duplicates, wasting compute in the optimization loop. Add a test that passes a PolicyEdit whose change is already present in the current surface (e.g., an append whose value already exists in the string), verifying the resulting candidates array is empty.

🟠 MEDIUM No test for population limit bounding — tests/campaign/policy-edit-proposer.test.ts

src/campaign/proposers/policy-edit.ts:41-44 computes limit = max(0, min(maxCandidates ?? populationSize, populationSize)). No test supplies multiple edits with a bounded maxCandidates or populationSize to verify the limit truncates. If the clamping logic regresses (e.g., negative limit), propose() could return unbounded candidates or crash. Add a test with populationSize=1 and two admitted edits, asserting output length is 1.

🟠 MEDIUM applyPolicyEditToSurface error paths untested — tests/policy-edit.test.ts

applyPolicyEditToSurface (policy-edit.ts:307) has three thrown-error branches that lack test coverage: (1) text change on non-string surface → PolicyEditValidationError at applyTextChange line 469, (2) replace with find not in surface → PolicyEditValidationError at line 481, (3) invalid JSON string surface → PolicyEditValidationError at parseJsonSurface line 513. Given the repo's 'fail loud' philosophy, a refactor that ac

🟡 LOW append idempotency check uses substring match — src/analyst/policy-edit.ts

Line 472: if (surface.includes(change.value)) return surface — the idempotency guard for text-append changes uses String.includes, which matches substrings. If a recommended_action happens to be a substring of the existing surface (e.g., change='fetch' inside surface='Always fetch current state'), the edit is silently skipped without actually being applied. Impact is minimal: policy edit recommended_actions are typically full sentences, not fragments, so false matches are extremely unlikely. Fix: use word-boundary or exact-line matching, or simply always append (letting duplicate pre-push gating handle dedup).

🟡 LOW expectConfidence accepts NaN — src/analyst/policy-edit.ts

Line 768-772: expectConfidence checks typeof value !== 'number' || value < 0 || value > 1. NaN passes all three checks: typeof NaN === 'number' (true), NaN < 0 (false), NaN > 1 (false). A PolicyEdit constructed programmatically with confidence=NaN passes full validation. JSON.stringify converts NaN to null so the content hash stays consistent. Downstream scorePolicyEditReadiness clamps NaN to 0 via clamp01, making this a validation gap with negligible functional impact. Fix: add Number.isNaN(value) or use !Number.isFinite(value) in the condition.

🟡 LOW Rationale string renders empty findingIds as '[]' — src/campaign/proposers/policy-edit.ts

src/campaign/proposers/policy-edit.ts:61 — The rationale template source findings [${edit.source.findingIds.join(', ')}] produces source findings [] when findingIds is empty. While validatePolicyEdit requires non-empty findingIds for validated PolicyEdits (src/analyst/policy-edit.ts:688 — expectStringArray checks each item is a non-empty string, but permits empty array), the cosmetic output is awkward. Either require non-empty in validation or suppress the clause when empty.

🟡 LOW Weak duck-type guard isAnalystFindingLike — src/campaign/proposers/policy-edit.ts

src/campaign/proposers/policy-edit.ts:90-94 — isAnalystFindingLike classifies any object with finding_id and analyst_id string fields as an AnalystFinding. A non-finding object that coincidentally has these field names would pass, then get silently skipped by policyEditFromFinding (null return when recommended_action is missing). This is not a crash, but it silently discards input. A stricter check (verifying at least one more required field like claim or severity) would make false positives impossible.

🟡 LOW No test for coerceCandidateSurface error path or code surface coercion — tests/campaign/policy-edit-proposer.test.ts

src/campaign/proposers/policy-edit.ts:96-106 coerces surfaces: strings pass through, CodeSurface objects (kind:'code', worktreeRef) pass through, plain objects get JSON.stringify'd, and unsupported types throw. Only the JSON.stringify path is tested (test 4). The CodeSurface path and the error-throw path are untested. A malformed edit producing a non-string, non-object surface would throw an unhandled error. Add tests for the CodeSurface coercion and the error case.

🟡 LOW No test for opts.edits static path bypassing ctx.findings — tests/campaign/policy-edit-proposer.test.ts

src/campaign/proposers/policy-edit.ts:40 uses opts.edits ?? ctx.findings. When opts.edits is provided, ctx.findings is ignored entirely. No test verifies this branch. This is the path callers use when an analyst has already emitted typed PolicyEdit objects, bypassing the finding→edit conversion. Add a test passing edits via opts.edits with an empty ctx.findings array, verifying the proposer still produces candidates.

🟡 LOW Missing tests for text prepend and JSON merge modes — tests/policy-edit.test.ts

The 'applies text and JSON changes deterministically' test covers text append/replace and JSON set/remove, but does not test text prepend mode or JSON merge mode. While the implementations are structurally similar to append/set, the normalizeChange transform at line 593 and applyJsonChange merge logic at line 494-504 are exercised only indirectly by the type system, not by assertions.

🟡 LOW admitPolicyEdit option variants untested — tests/policy-edit.test.ts

admitPolicyEdit supports allowHighRisk, requireEvidence, and minScore options, but the admission calibration test only exercises defaults. A high-risk edit with allowHighRisk:true should admit, and a valid edit with a raised minScore should reject. Adding these cases would lock in the configurable admission surface.


tangletools · 2026-06-28T20:08:03Z · trace

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Approved — 13 non-blocking findings — abdd9f59

Full multi-shot audit completed 6/6 planned shots over 8 changed files. Global verifier still owns final merge decision.

Full immutable report for this review: trace

Summary comment for this run: full summary


tangletools · 2026-06-28T20:08:03Z · immutable trace

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants