Skip to content

[DRAFT] FEAT add Policy Puppetry converter (#2080)#2081

Draft
kenlacroix wants to merge 1 commit into
microsoft:mainfrom
kenlacroix:feat/policy-puppetry-converter
Draft

[DRAFT] FEAT add Policy Puppetry converter (#2080)#2081
kenlacroix wants to merge 1 commit into
microsoft:mainfrom
kenlacroix:feat/policy-puppetry-converter

Conversation

@kenlacroix

Copy link
Copy Markdown

Description

Implements the PolicyPuppetryConverter proposed in #2080 — a converter for HiddenLayer's Policy Puppetry technique (Apr 2025), which wraps a request in a fabricated policy/config block (XML/JSON/INI) that many models treat as trusted developer instructions.

This is a [DRAFT] opened alongside #2080 so the working implementation is visible while the design is still under discussion. I'd like maintainer steer on the open questions in #2080 before finalizing:

  1. Pure-template, no-LLM converter with a policy_format param (xml/json/ini) — does this match how you'd want it scoped?
  2. Leetspeak: compose with the existing LeetspeakConverter in a chain, or keep the optional leetspeak flag this draft currently exposes?
  3. Template packaging: the wrapper ships as a SeedPrompt YAML. Because SeedPrompt.from_yaml_file eagerly pre-renders trusted templates (collapsing the policy_format branch), this draft loads the YAML and constructs SeedPrompt(**data) directly. Preference between that, selecting the format block in Python, or three per-format YAMLs?
  4. Roleplay persona/scene is parameterized rather than hardcoded — keep generic, or ship a sensible default?

Design intentionally favors generic implementation per doc/contributing/2_incorporating_research.md. The shipped template uses a benign {{ prompt }} placeholder and a generalized persona, not a weaponized payload.

Tests and Documentation

  • Tests: tests/unit/prompt_converter/test_policy_puppetry_converter.py — 8 unit tests (placeholder substitution, each policy_format, formats differ, leetspeak toggle, input/output support). All pass locally (8 passed).
  • Documentation / JupyText: not yet added — I held the doc/code/converters/1_text_to_text_converters.py demo cell until the design (esp. the leetspeak + template-packaging questions) is settled, since those change the example. Will add the JupyText-paired cell before marking ready for review.
  • CLA: happy to sign the Microsoft CLA.

Add PolicyPuppetryConverter, a pure-template (no-LLM) converter implementing
HiddenLayer's Policy Puppetry technique: wraps a prompt in a fabricated
policy/config block (xml/json/ini, selectable via policy_format) so models
treat it as trusted developer instructions. Optional leetspeak composition.
Template ships as a SeedPrompt YAML with a benign {{ prompt }} placeholder.

Includes unit tests (8) and registration in prompt_converter/__init__.py.
Opened as a draft pending maintainer feedback on the design questions in microsoft#2080.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@kenlacroix

Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant