Research Glossary Simulator Docs Novels Get Certified
AI Trust Glossary  ·  Canonical Definition

Adversarial Robustness

An AI system's ability to maintain correct behavior when facing deliberately manipulated inputs designed to cause failure.
Borealis Research Team  ·  Updated March 2026  ·  View all 47 terms
Unlike general robustness (handling natural variation), adversarial robustness addresses deliberate attacks - inputs crafted specifically to exploit model weaknesses. These inputs are often imperceptible to humans but reliably cause AI systems to misclassify, hallucinate, or violate constraints.
Any deployed AI agent is a potential attack surface. A customer service agent that can be manipulated into revealing private data, or a financial agent that can be tricked into bypassing transaction limits, is not production-ready regardless of its benchmark scores.
Adversarial robustness is tested as part of the constraint adherence dimension. Agents are evaluated against edge-case and adversarial inputs during audit. Weak adversarial robustness directly reduces the BM Score.
Ready to put this into practice?
Certify your AI agent on BorealisMark and get a verifiable BM Score anchored to Hedera Hashgraph. Or run the BM Score Simulator to estimate your agent's score right now.