AI Trust Glossary · Canonical Definition
Red Teaming
Deliberate adversarial testing of AI systems - having a dedicated team attempt to find vulnerabilities, elicit harmful outputs, and expose failure modes before deployment.
Explanation
Red teaming in AI borrows from military and cybersecurity practice. A team tasked with attacking the system finds weaknesses that the development team's optimistic assumptions obscure. Effective red teaming requires domain expertise, adversarial creativity, and independence from the development team.
Why it matters
Development teams assume good-faith use. Red teams assume adversarial use. The gap between these assumptions is where most exploitable vulnerabilities live. An AI agent that has not been red-teamed has not been tested for conditions it will face in production.
How Borealis uses it
The Borealis audit process includes adversarial testing as part of ARBITER evaluation. Red team findings contribute evidence for the constraint adherence dimension. Organizations submitting agents for certification are encouraged to include their own red team results as supplementary audit evidence.