Explainable AI (XAI) encompasses methods and techniques that make machine learning model decisions interpretable to humans. As AI is used in high-stakes decisions (loan approval, medical diagnosis, parole recommendations), the right to explanation becomes a legal and ethical requirement (GDPR Article 22, EU AI Act). XAI methods include: intrinsically interpretable models (decision trees, linear regression), post-hoc explanation methods (LIME, SHAP), attention visualisation, and concept-based explanations. The trade-off between model performance and interpretability is a central tension in applied AI.
Real-life analogy: The bank loan officer
A traditional bank loan officer denies your application and explains: 'Your debt-to-income ratio is 45% — our limit is 40%.' You understand the reason, can dispute it, and know what to improve. A black-box AI loan model denies your application with no explanation. You cannot dispute it, you do not know what to improve, and it may be discriminating based on a proxy variable you cannot see. XAI provides the loan officer's explanation — the reasoning chain that makes decisions contestable and trustworthy.
Intrinsic vs post-hoc interpretability
| Approach | Method | Explanation type | Trade-off | When to use |
|---|---|---|---|---|
| Intrinsic | Linear regression | Coefficients = feature importances | Limited model complexity | When data is linearly separable, high-stakes regulated |
| Intrinsic | Decision tree (shallow) | Rule path: if X>5 and Y<3 then class=A | Moderate accuracy | Rule-based decisions, auditability required |
| Post-hoc local | LIME (Local Interpretable Model-agnostic) | Local linear approximation around one prediction | Unstable, may not generalise | Explaining individual predictions of any model |
| Post-hoc local | SHAP (SHapley Additive exPlanations) | Feature contribution for each prediction | Computationally expensive | Gold standard for local + global explanation |
| Post-hoc global | Feature importance (RF/XGBoost) | Global impurity-based or permutation importance | No individual explanation | Understanding overall model behaviour |
| Attention maps | Transformer attention weights | Which tokens the model attended to | Attention ≠ explanation (debated) | NLP models, visual grounding |
SHAP and LIME explanations for any model
import shap
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
# Train a black-box model (Random Forest)
X, y = make_classification(n_samples=1000, n_features=10, random_state=42)
feature_names = [f'feature_{i}' for i in range(10)]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
# ── SHAP: Gold standard for feature attribution ──
# Tree-based SHAP (fast for tree models)
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)
# Global explanation: feature importance across all predictions
print("Global feature importance (SHAP mean absolute values):")
global_shap = np.abs(shap_values[1]).mean(axis=0)
for feat, importance in sorted(zip(feature_names, global_shap),
key=lambda x: -x[1]):
bar = '█' * int(importance * 50)
print(f" {feat:12}: {importance:.4f} {bar}")
# Local explanation: explain ONE individual prediction
idx = 5 # Explain prediction for test sample 5
print(f"
Prediction for sample {idx}: {model.predict([X_test[idx]])[0]}")
print(f"Probability: {model.predict_proba([X_test[idx]])[0]}")
print("Feature contributions (SHAP values):")
for feat, shap_val in sorted(zip(feature_names, shap_values[1][idx]),
key=lambda x: -abs(x[1])):
direction = '↑ pushes toward class 1' if shap_val > 0 else '↓ pushes toward class 0'
print(f" {feat:12}: {shap_val:+.4f} {direction}")
# ── LIME: Local interpretable model-agnostic explanation ──
from lime import lime_tabular
lime_explainer = lime_tabular.LimeTabularExplainer(
X_train,
feature_names=feature_names,
class_names=['Class 0', 'Class 1'],
mode='classification',
discretize_continuous=True,
)
# Explain one prediction with LIME
lime_exp = lime_explainer.explain_instance(
X_test[idx],
model.predict_proba,
num_features=5, # Show top 5 most influential features
)
print(f"
LIME explanation for sample {idx}:")
for feat, weight in lime_exp.as_list():
print(f" {feat:30}: {weight:+.4f}")GDPR and the right to explanation
GDPR Article 22: Individuals have the right not to be subject to decisions based solely on automated processing that significantly affect them. When such decisions are made, individuals have the right to: (1) Obtain a meaningful explanation of the logic. (2) Express their point of view. (3) Obtain human review. This makes XAI a legal requirement for high-stakes AI in Europe — not just a nice-to-have. The EU AI Act extends this to high-risk AI systems globally.
The accuracy-interpretability trade-off is often overstated
The conventional wisdom is: simple models (linear regression, decision trees) are interpretable but less accurate; complex models (neural networks, XGBoost) are accurate but black-box. In practice: (1) SHAP provides near-full explainability for XGBoost with minimal accuracy loss. (2) Deeply interpretable models (Rudin 2019) can match black-box accuracy on many datasets. (3) Post-hoc explanations for neural networks (LIME, SHAP) are increasingly reliable. The trade-off exists but is not as stark as often claimed.
Practice questions
- What is the difference between local and global XAI explanations? (Answer: Local explanation: explains why the model made a specific prediction for one individual instance (e.g., why THIS loan was denied). Global explanation: describes the overall model behaviour — which features generally matter most. SHAP provides both: local = per-prediction SHAP values; global = mean absolute SHAP values across all predictions.)
- A SHAP value of +0.3 for "income" means: (Answer: Income pushed the model's prediction by +0.3 toward the positive class for this specific prediction. Higher income = more likely to be classified positive. SHAP values sum to the prediction minus the base rate (expected prediction): Σ SHAP_i = prediction - E[prediction].)
- Why is attention not a reliable explanation for transformer models? (Answer: Attention measures which tokens were combined to produce representations, not which tokens caused the prediction. Multiple attention heads, residual connections, and subsequent MLP layers mean the final prediction depends on much more than raw attention weights. A token with high attention may not be causally important for the output. Gradient-based methods (Integrated Gradients, SHAP) are more reliable.)
- Under GDPR, if a bank uses an AI model to reject a loan application, what are the applicant's rights? (Answer: Right to explanation (Article 22): the bank must provide meaningful information about the logic of the automated decision. Right to human review: the applicant can request a human reconsider the decision. Right to contest: the applicant can challenge the decision. The bank cannot refuse these rights even if their model is a proprietary black box.)
- A decision tree explanation says: "Denied because income < 40000 AND debt > 20000." What advantage does this have over a SHAP explanation? (Answer: The decision tree gives a complete, causal rule that is exactly how the model made the decision — not an approximation. LIME and SHAP are approximations or attributions, not the actual decision logic. For regulatory compliance, a rule-based explanation is more defensible. Disadvantage: decision trees sacrifice accuracy to be interpretable.)
On LumiChats
When LumiChats explains its reasoning step-by-step (Chain-of-Thought), it is providing a form of interpretability — the reasoning trace makes the conclusion checkable. True XAI for LLMs is an active research area: mechanistic interpretability studies what computations specific neurons and circuits perform inside the model.
Try it free