Interpretability: Definition and Meaning | Borealis AI Trust Glossary

Explanation

Interpretability asks: can a human inspect the model's workings and understand why it functions the way it does? A linear regression is highly interpretable - you can inspect every coefficient. A large language model is not - its behavior emerges from billions of parameters that resist simple inspection.

Why it matters

Interpretability enables diagnosis. When a model behaves unexpectedly, interpretable models allow engineers to find the root cause. In safety-critical domains, interpretability is sometimes a regulatory prerequisite.

How Borealis uses it

Interpretability informs how decision transparency is measured. Agents with lower interpretability face a higher burden on decision transparency - they must compensate through robust reasoning chains and confidence scoring since their internals cannot be inspected.

See also

Explainability Decision Transparency