[DRAFT] FEAT add Policy Puppetry converter (#2080)#2081
Draft
kenlacroix wants to merge 1 commit into
Draft
Conversation
Add PolicyPuppetryConverter, a pure-template (no-LLM) converter implementing
HiddenLayer's Policy Puppetry technique: wraps a prompt in a fabricated
policy/config block (xml/json/ini, selectable via policy_format) so models
treat it as trusted developer instructions. Optional leetspeak composition.
Template ships as a SeedPrompt YAML with a benign {{ prompt }} placeholder.
Includes unit tests (8) and registration in prompt_converter/__init__.py.
Opened as a draft pending maintainer feedback on the design questions in microsoft#2080.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Author
|
@microsoft-github-policy-service agree |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Implements the
PolicyPuppetryConverterproposed in #2080 — a converter for HiddenLayer's Policy Puppetry technique (Apr 2025), which wraps a request in a fabricated policy/config block (XML/JSON/INI) that many models treat as trusted developer instructions.This is a [DRAFT] opened alongside #2080 so the working implementation is visible while the design is still under discussion. I'd like maintainer steer on the open questions in #2080 before finalizing:
policy_formatparam (xml/json/ini) — does this match how you'd want it scoped?LeetspeakConverterin a chain, or keep the optionalleetspeakflag this draft currently exposes?SeedPromptYAML. BecauseSeedPrompt.from_yaml_fileeagerly pre-renders trusted templates (collapsing thepolicy_formatbranch), this draft loads the YAML and constructsSeedPrompt(**data)directly. Preference between that, selecting the format block in Python, or three per-format YAMLs?Design intentionally favors generic implementation per
doc/contributing/2_incorporating_research.md. The shipped template uses a benign{{ prompt }}placeholder and a generalized persona, not a weaponized payload.Tests and Documentation
tests/unit/prompt_converter/test_policy_puppetry_converter.py— 8 unit tests (placeholder substitution, eachpolicy_format, formats differ, leetspeak toggle, input/output support). All pass locally (8 passed).doc/code/converters/1_text_to_text_converters.pydemo cell until the design (esp. the leetspeak + template-packaging questions) is settled, since those change the example. Will add the JupyText-paired cell before marking ready for review.