Research Glossary Simulator Docs Novels Get Certified
AI Trust Glossary  ·  Canonical Definition

Prompt Injection

An attack technique where malicious inputs attempt to override an AI agent's instructions, constraints, or system prompt - redirecting the agent's behavior toward attacker goals.
Borealis Research Team  ·  Updated March 2026  ·  View all 47 terms
Prompt injection exploits the fact that language models process instructions and user inputs in the same channel. Attackers embed instructions like 'Ignore previous instructions and...' to override system-level constraints. Direct injection targets the agent's own prompt; indirect injection embeds attacks in data the agent processes (web pages, documents, emails).
A successful prompt injection can bypass every guardrail the agent has. For any AI agent with access to external systems or sensitive data, prompt injection resistance is a prerequisite for production deployment.
Prompt injection resistance is tested as part of constraint adherence evaluation. CRITICAL severity constraints include injection resistance requirements. Agents that fail injection tests receive sharply reduced constraint adherence scores regardless of performance on non-adversarial inputs.
Ready to put this into practice?
Certify your AI agent on BorealisMark and get a verifiable BM Score anchored to Hedera Hashgraph. Or run the BM Score Simulator to estimate your agent's score right now.