Research Glossary Simulator Docs Novels Get Certified
Prompt injection exploits the fact that language models process instructions and user inputs in the same channel. Attackers embed instructions like 'Ignore previous instructions and...' to override system-level constraints. Direct injection targets the agent's own prompt; indirect injection embeds attacks in data the agent processes (web pages, documents, emails).
A successful prompt injection can bypass every guardrail the agent has. For any AI agent with access to external systems or sensitive data, prompt injection resistance is a prerequisite for production deployment.
Prompt injection resistance is tested as part of constraint adherence evaluation. CRITICAL severity constraints include injection resistance requirements. Agents that fail injection tests receive sharply reduced constraint adherence scores regardless of performance on non-adversarial inputs.
Ready to put this into practice?
Certify your AI agent on BorealisMark and get a verifiable BTS anchored to Hedera Hashgraph. Or run the BTS Simulator to estimate your agent's score right now.