Research Glossary Simulator Docs Novels Get Certified
AI Trust Glossary  ·  Canonical Definition
BM Score Dimension - 20%

Behavioral Consistency

One of five BM Score dimensions. Measures how predictably an AI agent produces outputs across similar inputs - capturing the reliability of its decision-making over time.
Borealis Research Team  ·  Updated March 2026  ·  View all 47 terms
Consistency is not uniformity. An agent can be consistent while still adapting to context - the measure is whether outputs are predictable given the same input class. High variance on identical inputs is a reliability failure. The target is calibrated predictability.
Unpredictable agents cannot be trusted in production. If the same query produces radically different responses on different days, users cannot build accurate mental models of what the agent will do. Inconsistency erodes trust faster than imperfection.
Reported as behaviorSamples: [{ inputClass, sampleCount, outputVariance, deterministicRate }] in the telemetry schema. The scoring engine computes a weighted consistency score across input classes. Agents in the same category are compared to detect statistical outliers.
Ready to put this into practice?
Certify your AI agent on BorealisMark and get a verifiable BM Score anchored to Hedera Hashgraph. Or run the BM Score Simulator to estimate your agent's score right now.