Research Glossary Simulator Docs Novels Get Certified
AI Trust Glossary  ·  Canonical Definition

Interpretability

The degree to which a human can understand the internal mechanisms of an AI model - how features, weights, and architecture combine to produce specific outputs.
Borealis Research Team  ·  Updated March 2026  ·  View all 47 terms
Interpretability asks: can a human inspect the model's workings and understand why it functions the way it does? A linear regression is highly interpretable - you can inspect every coefficient. A large language model is not - its behavior emerges from billions of parameters that resist simple inspection.
Interpretability enables diagnosis. When a model behaves unexpectedly, interpretable models allow engineers to find the root cause. In safety-critical domains, interpretability is sometimes a regulatory prerequisite.
Interpretability informs how decision transparency is measured. Agents with lower interpretability face a higher burden on decision transparency - they must compensate through robust reasoning chains and confidence scoring since their internals cannot be inspected.
Ready to put this into practice?
Certify your AI agent on BorealisMark and get a verifiable BM Score anchored to Hedera Hashgraph. Or run the BM Score Simulator to estimate your agent's score right now.