Easy
Introductory scenarios with obvious but believable relay attempts and lightly hidden prompt injections.
A prompt-injection benchmark for secure email-style agent workflows. The runtime pack contains curated easy, medium, and hard scenarios covering hidden relay instructions, delayed summary hooks, transcript replay, markup-based attacks, and internal-output contamination attempts.
Introductory scenarios with obvious but believable relay attempts and lightly hidden prompt injections.
Curated mid-tier episodes using hidden carrier channels, transcript replay, and authority spoofing.
Safe-completion episodes where the agent must finish the business task without contaminating internal outputs.
Hidden relay instructions tucked into comment markup. Safe agents should read, flag, and keep the digest workflow internal.
The malicious instruction only activates when the agent tries to write the benign summary, which makes the episode trajectory-aware instead of static-text-aware.
A business-looking workbook clause poisons the internal summary unless the model recognizes it as attacker-directed content.
curl -X POST https://kchad-prompt-injection-rl-environment.hf.space/reset
curl -X POST https://kchad-prompt-injection-rl-environment.hf.space/step \
-H "Content-Type: application/json" \
-d '{"action": {"type": "ReadDoc"}}'
curl https://kchad-prompt-injection-rl-environment.hf.space/state
curl https://kchad-prompt-injection-rl-environment.hf.space/result