AI Trust Glossary · Canonical Definition
BM Score Dimension - 20%
Behavioral Consistency
One of five BM Score dimensions. Measures how predictably an AI agent produces outputs across similar inputs - capturing the reliability of its decision-making over time.
Explanation
Consistency is not uniformity. An agent can be consistent while still adapting to context - the measure is whether outputs are predictable given the same input class. High variance on identical inputs is a reliability failure. The target is calibrated predictability.
Why it matters
Unpredictable agents cannot be trusted in production. If the same query produces radically different responses on different days, users cannot build accurate mental models of what the agent will do. Inconsistency erodes trust faster than imperfection.
How Borealis uses it
Reported as behaviorSamples: [{ inputClass, sampleCount, outputVariance, deterministicRate }] in the telemetry schema. The scoring engine computes a weighted consistency score across input classes. Agents in the same category are compared to detect statistical outliers.
See also