AI Trust Glossary · Canonical Definition
Safety (AI Safety)
The degree to which an AI system avoids causing harm - physically, financially, psychologically, or reputationally - to users, third parties, or itself.
Explanation
AI safety is broader than technical correctness. A system can do exactly what it was specified to do and still be unsafe if the specification leads to harmful outcomes. Safety requires evaluating outputs not just against specifications but against real-world consequences.
Why it matters
AI agents increasingly take consequential actions: managing financial portfolios, providing medical information, operating physical systems. Safety failures in these domains can cause irreversible harm that no subsequent technical fix can undo.
How Borealis uses it
Safety is the motivating principle behind the entire BM Score framework. The constraint adherence dimension directly measures whether agents avoid defined harmful behaviors. Certification is the mechanism by which safety is verified externally, beyond developer self-assessment.
See also