What is intrinsic vs post-hoc interpretability?

Explainable AI (XAI) — Interpretability & Transparency: Intrinsic vs post-hoc interpretability. Learn more in the LumiChats AI Glossary at https://lumichats.com/glossary/explainable-ai

What is practice questions?

Explainable AI (XAI) — Interpretability & Transparency: Practice questions. Learn more in the LumiChats AI Glossary at https://lumichats.com/glossary/explainable-ai

Explainable AI (XAI) — Interpretability & Transparency

Explainable AI (XAI) encompasses methods and techniques that make machine learning model decisions interpretable to humans. As AI is used in high-stakes decisions (loan approval, medical diagnosis, parole recommendations), the right to explanation becomes a legal and ethical requirement (GDPR Article 22, EU AI Act). XAI methods include: intrinsically interpretable models (decision trees, linear regression), post-hoc explanation methods (LIME, SHAP), attention visualization, and concept-based explanations. The trade-off between model performance and interpretability is a central tension in applied AI.

Making AI decisions understandable to humans — the difference between a black box and a trusted system.

Category: AI Safety & Ethics

Real-life analogy: The bank loan officer

A traditional bank loan officer denies your application and explains: 'Your debt-to-income ratio is 45% — our limit is 40%.' You understand the reason, can dispute it, and know what to improve. A black-box AI loan model denies your application with no explanation. You cannot dispute it, you do not know what to improve, and it may be discriminating based on a proxy variable you cannot see. XAI provides the loan officer's explanation — the reasoning chain that makes decisions contestable and trustworthy.

Intrinsic vs post-hoc interpretability

Approach	Method	Explanation type	Trade-off	When to use
Intrinsic	Linear regression	Coefficients = feature importances	Limited model complexity	When data is linearly separable, high-stakes regulated
Intrinsic	Decision tree (shallow)	Rule path: if X>5 and Y<3 then class=A	Moderate accuracy	Rule-based decisions, auditability required
Post-hoc local	LIME (Local Interpretable Model-agnostic)	Local linear approximation around one prediction	Unstable, may not generalize	Explaining individual predictions of any model
Post-hoc local	SHAP (SHapley Additive exPlanations)	Feature contribution for each prediction	Computationally expensive	Gold standard for local + global explanation
Post-hoc global	Feature importance (RF/XGBoost)	Global impurity-based or permutation importance	No individual explanation	Understanding overall model behavior
Attention maps	Transformer attention weights	Which tokens the model attended to	Attention ≠ explanation (debated)	NLP models, visual grounding

import shap
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

# Train a black-box model (Random Forest)
X, y = make_classification(n_samples=1000, n_features=10, random_state=42)
feature_names = [f'feature_{i}' for i in range(10)]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# ── SHAP: Gold standard for feature attribution ──
# Tree-based SHAP (fast for tree models)
explainer    = shap.TreeExplainer(model)
shap_values  = explainer.shap_values(X_test)

# Global explanation: feature importance across all predictions
print("Global feature importance (SHAP mean absolute values):")
global_shap = np.abs(shap_values[1]).mean(axis=0)
for feat, importance in sorted(zip(feature_names, global_shap),
                                key=lambda x: -x[1]):
    bar = '█' * int(importance * 50)
    print(f"  {feat:12}: {importance:.4f} {bar}")

# Local explanation: explain ONE individual prediction
idx = 5   # Explain prediction for test sample 5
print(f"
Prediction for sample {idx}: {model.predict([X_test[idx]])[0]}")
print(f"Probability: {model.predict_proba([X_test[idx]])[0]}")
print("Feature contributions (SHAP values):")
for feat, shap_val in sorted(zip(feature_names, shap_values[1][idx]),
                              key=lambda x: -abs(x[1])):
    direction = '↑ pushes toward class 1' if shap_val > 0 else '↓ pushes toward class 0'
    print(f"  {feat:12}: {shap_val:+.4f}  {direction}")

# ── LIME: Local interpretable model-agnostic explanation ──
from lime import lime_tabular
lime_explainer = lime_tabular.LimeTabularExplainer(
    X_train,
    feature_names=feature_names,
    class_names=['Class 0', 'Class 1'],
    mode='classification',
    discretize_continuous=True,
)

# Explain one prediction with LIME
lime_exp = lime_explainer.explain_instance(
    X_test[idx],
    model.predict_proba,
    num_features=5,   # Show top 5 most influential features
)
print(f"
LIME explanation for sample {idx}:")
for feat, weight in lime_exp.as_list():
    print(f"  {feat:30}: {weight:+.4f}")

GDPR and the right to explanation

GDPR Article 22: Individuals have the right not to be subject to decisions based solely on automated processing that significantly affect them. When such decisions are made, individuals have the right to: (1) Obtain a meaningful explanation of the logic. (2) Express their point of view. (3) Obtain human review. This makes XAI a legal requirement for high-stakes AI in Europe — not just a nice-to-have. The EU AI Act extends this to high-risk AI systems globally.

The accuracy-interpretability trade-off is often overstated: The conventional wisdom is: simple models (linear regression, decision trees) are interpretable but less accurate; complex models (neural networks, XGBoost) are accurate but black-box. In practice: (1) SHAP provides near-full explainability for XGBoost with minimal accuracy loss. (2) Deeply interpretable models (Rudin 2019) can match black-box accuracy on many datasets. (3) Post-hoc explanations for neural networks (LIME, SHAP) are increasingly reliable. The trade-off exists but is not as stark as often claimed.

Practice questions

What is the difference between local and global XAI explanations? (Answer: Local explanation: explains why the model made a specific prediction for one individual instance (e.g., why THIS loan was denied). Global explanation: describes the overall model behavior — which features generally matter most. SHAP provides both: local = per-prediction SHAP values; global = mean absolute SHAP values across all predictions.)
A SHAP value of +0.3 for "income" means: (Answer: Income pushed the model's prediction by +0.3 toward the positive class for this specific prediction. Higher income = more likely to be classified positive. SHAP values sum to the prediction minus the base rate (expected prediction): Σ SHAP_i = prediction - E[prediction].)
Why is attention not a reliable explanation for transformer models? (Answer: Attention measures which tokens were combined to produce representations, not which tokens caused the prediction. Multiple attention heads, residual connections, and subsequent MLP layers mean the final prediction depends on much more than raw attention weights. A token with high attention may not be causally important for the output. Gradient-based methods (Integrated Gradients, SHAP) are more reliable.)
Under GDPR, if a bank uses an AI model to reject a loan application, what are the applicant's rights? (Answer: Right to explanation (Article 22): the bank must provide meaningful information about the logic of the automated decision. Right to human review: the applicant can request a human reconsider the decision. Right to contest: the applicant can challenge the decision. The bank cannot refuse these rights even if their model is a proprietary black box.)
A decision tree explanation says: "Denied because income < 40000 AND debt > 20000." What advantage does this have over a SHAP explanation? (Answer: The decision tree gives a complete, causal rule that is exactly how the model made the decision — not an approximation. LIME and SHAP are approximations or attributions, not the actual decision logic. For regulatory compliance, a rule-based explanation is more defensible. Disadvantage: decision trees sacrifice accuracy to be interpretable.)

When LumiChats explains its reasoning step-by-step (Chain-of-Thought), it is providing a form of interpretability — the reasoning trace makes the conclusion checkable. True XAI for LLMs is an active research area: mechanistic interpretability studies what computations specific neurons and circuits perform inside the model.

Approach

Method

Explanation type

Trade-off

When to use

Intrinsic

Linear regression

Coefficients = feature importances

Limited model complexity

When data is linearly separable, high-stakes regulated

Intrinsic

Decision tree (shallow)

Rule path: if X>5 and Y<3 then class=A

Moderate accuracy

Rule-based decisions, auditability required

Post-hoc local

LIME (Local Interpretable Model-agnostic)

Local linear approximation around one prediction

Unstable, may not generalize

Explaining individual predictions of any model

Post-hoc local

SHAP (SHapley Additive exPlanations)

Feature contribution for each prediction

Computationally expensive

Gold standard for local + global explanation

Post-hoc global

Feature importance (RF/XGBoost)

Global impurity-based or permutation importance

No individual explanation

Understanding overall model behavior

Attention maps

Transformer attention weights

Which tokens the model attended to

Attention ≠ explanation (debated)

NLP models, visual grounding

import shap import numpy as np from sklearn.ensemble import RandomForestClassifier from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split # Train a black-box model (Random Forest) X, y = make_classification(n_samples=1000, n_features=10, random_state=42) feature_names = [f'feature_{i}' for i in range(10)] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) model = RandomForestClassifier(n_estimators=100, random_state=42) model.fit(X_train, y_train) # ── SHAP: Gold standard for feature attribution ── # Tree-based SHAP (fast for tree models) explainer = shap.TreeExplainer(model) shap_values = explainer.shap_values(X_test) # Global explanation: feature importance across all predictions print("Global feature importance (SHAP mean absolute values):") global_shap = np.abs(shap_values[1]).mean(axis=0) for feat, importance in sorted(zip(feature_names, global_shap), key=lambda x: -x[1]): bar = '█' * int(importance * 50) print(f" {feat:12}: {importance:.4f} {bar}") # Local explanation: explain ONE individual prediction idx = 5 # Explain prediction for test sample 5 print(f" Prediction for sample {idx}: {model.predict([X_test[idx]])[0]}") print(f"Probability: {model.predict_proba([X_test[idx]])[0]}") print("Feature contributions (SHAP values):") for feat, shap_val in sorted(zip(feature_names, shap_values[1][idx]), key=lambda x: -abs(x[1])): direction = '↑ pushes toward class 1' if shap_val > 0 else '↓ pushes toward class 0' print(f" {feat:12}: {shap_val:+.4f} {direction}") # ── LIME: Local interpretable model-agnostic explanation ── from lime import lime_tabular lime_explainer = lime_tabular.LimeTabularExplainer( X_train, feature_names=feature_names, class_names=['Class 0', 'Class 1'], mode='classification', discretize_continuous=True, ) # Explain one prediction with LIME lime_exp = lime_explainer.explain_instance( X_test[idx], model.predict_proba, num_features=5, # Show top 5 most influential features ) print(f" LIME explanation for sample {idx}:") for feat, weight in lime_exp.as_list(): print(f" {feat:30}: {weight:+.4f}")

Explainable AI (XAI) — Interpretability & Transparency

Real-life analogy: The bank loan officer

Intrinsic vs post-hoc interpretability

GDPR and the right to explanation

Practice questions

Explainable AI (XAI) — Interpretability & Transparency

Real-life analogy: The bank loan officer

Intrinsic vs post-hoc interpretability

Practice questions

Practice what you just learned

Related Terms