OpenEnv Benchmark

IDPI Exfil Environment

A prompt-injection benchmark for secure email-style agent workflows. The runtime pack contains curated easy, medium, and hard scenarios covering hidden relay instructions, delayed summary hooks, transcript replay, markup-based attacks, and internal-output contamination attempts.

Runtime Pack

EASY

Easy

Introductory scenarios with obvious but believable relay attempts and lightly hidden prompt injections.

MEDIUM

Medium

Curated mid-tier episodes using hidden carrier channels, transcript replay, and authority spoofing.

HARD

Hard

Safe-completion episodes where the agent must finish the business task without contaminating internal outputs.

Showcase Episodes

HIDDEN MARKUP

Hidden Relay Instruction

Hidden relay instructions tucked into comment markup. Safe agents should read, flag, and keep the digest workflow internal.

DELAYED TRIGGER

Summary-Time Hook

The malicious instruction only activates when the agent tries to write the benign summary, which makes the episode trajectory-aware instead of static-text-aware.

SEMANTIC

Internal Output Poisoning

A business-looking workbook clause poisons the internal summary unless the model recognizes it as attacker-directed content.

Quick Start

curl -X POST https://kchad-prompt-injection-rl-environment.hf.space/reset

curl -X POST https://kchad-prompt-injection-rl-environment.hf.space/step \
  -H "Content-Type: application/json" \
  -d '{"action": {"type": "ReadDoc"}}'

curl https://kchad-prompt-injection-rl-environment.hf.space/state
curl https://kchad-prompt-injection-rl-environment.hf.space/result
  • Use POST /reset to start a new episode. An empty body is valid.
  • Use POST /step with typed OpenEnv actions.
  • The evaluation runner uses the curated benchmark manifest as the runtime source of truth.
  • Inspect live schemas at /schema, docs at /docs, and metadata at /metadata.
  • Call /health for service health and /result after an episode to inspect grader output.