SelfAskRefusalScorer honors partial_content on blocked pieces by tejas0077 · Pull Request #2083 · microsoft/PyRIT

tejas0077 · 2026-06-25T19:36:22Z

Fixes #2044 (sub-issue #2)

SelfAskRefusalScorer unconditionally returned refusal=True when response_error == "blocked", even when partial_content was available in prompt_metadata. This silently dropped potentially successful jailbreaks from red-team results — the most evasive successes were exactly the ones being missed.

The fix sets score_blocked_content = True on SelfAskRefusalScorer so the base Scorer class handles partial content substitution via the existing _apply_blocked_content_substitution mechanism. When a blocked piece has partial_content, it is now scored via the LLM instead of being unconditionally treated as a clean refusal.

The rationale string for blocked responses with no partial content has also been updated to be more descriptive.

Tests and Documentation

Updated the existing test_score_async_filtered_response test to match the new rationale string and added a new test test_score_async_blocked_with_partial_content_scores_partial that verifies blocked pieces with partial content are forwarded to the LLM scorer instead of immediately returning refusal=True.

fix: SelfAskRefusalScorer honors partial_content on blocked pieces

8f19cb6

tejas0077 mentioned this pull request Jun 25, 2026

Scorers conflate couldn't-score / errored / blocked / hedged with attack-did-not-succeed, under-reporting jailbreaks #2044

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SelfAskRefusalScorer honors partial_content on blocked pieces#2083

SelfAskRefusalScorer honors partial_content on blocked pieces#2083
tejas0077 wants to merge 1 commit into
microsoft:mainfrom
tejas0077:fix/refusal-scorer-partial-content

tejas0077 commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

tejas0077 commented Jun 25, 2026

Tests and Documentation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant