What is fisher criterion — what LDA maximizes?

Linear Discriminant Analysis (LDA): Fisher criterion — what LDA maximizes. Learn more in the LumiChats AI Glossary at https://lumichats.com/glossary/kernel-methods-lda

What is practice questions (GATE-style)?

Linear Discriminant Analysis (LDA): Practice questions (GATE-style). Learn more in the LumiChats AI Glossary at https://lumichats.com/glossary/kernel-methods-lda

Linear Discriminant Analysis (LDA) — Fisher Criterion, QDA

Linear Discriminant Analysis (LDA)

Linear Discriminant Analysis (LDA) is both a supervised classification algorithm and a dimensionality reduction technique. It finds a projection of the data that maximizes the ratio of between-class variance to within-class variance — maximally separating the classes while keeping each class compact. LDA assumes features are normally distributed with equal covariance matrices across classes (unlike QDA which allows different covariances). Tested in GATE DS&AI alongside PCA as the supervised counterpart to unsupervised dimensionality reduction.

Finding the projection that best separates classes — both for classification and dimension reduction.

Category: Machine Learning

Real-life analogy: Shining a flashlight

Imagine two groups of colored balls scattered in 3D space. You want to find the angle to shine a flashlight so the shadows of the two groups on a wall are as separated as possible. LDA finds exactly this optimal projection direction — where the shadow separation between groups is maximum relative to how spread out each group shadow is.

Fisher criterion — what LDA maximizes

J(\mathbf{w}) = \frac{\mathbf{w}^T S_B \mathbf{w}}{\mathbf{w}^T S_W \mathbf{w}}

import numpy as np
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

iris = load_iris()
X, y = iris.data, iris.target   # 4 features, 3 classes

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42)

# ── LDA as classifier ──
lda_clf = LinearDiscriminantAnalysis()
lda_clf.fit(X_train, y_train)
print(f"LDA accuracy: {accuracy_score(y_test, lda_clf.predict(X_test)):.3f}")

# ── LDA as dimensionality reduction (K-1 components for K classes) ──
# 3 classes → max 2 LDA components
lda_2d = LinearDiscriminantAnalysis(n_components=2)
X_train_2d = lda_2d.fit_transform(X_train, y_train)
X_test_2d  = lda_2d.transform(X_test)
print(f"Shape: {X_train_2d.shape}")   # (120, 2) — 4D → 2D

# Explained variance ratio
print(f"Explained variance: {lda_2d.explained_variance_ratio_}")

# Compare with PCA (unsupervised)
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
X_train_pca = pca.fit_transform(X_train)
print(f"PCA explained variance: {pca.explained_variance_ratio_}")
# LDA finds better separating directions than PCA for classification tasks

Property	LDA	PCA
Supervision	Supervised (uses class labels)	Unsupervised (no labels needed)
Objective	Maximize class separation	Maximize variance
Max components	K-1 (K = number of classes)	min(n, p) components
Assumption	Gaussian classes, equal covariance	None (linear projections)
Best for	Classification + dimensionality reduction	Compression, visualization, pre-processing

QDA — Quadratic Discriminant Analysis: LDA assumes all classes share the same covariance matrix (Σ). QDA (Quadratic Discriminant Analysis) relaxes this — each class k has its own covariance matrix Σₖ. This creates quadratic decision boundaries (curved). QDA has more parameters and needs more data. LDA is a special case of QDA when all Σₖ are equal.

Practice questions (GATE-style)

LDA for a 5-class problem can produce at most how many discriminant components? (Answer: K−1 = 4 components. The between-class scatter matrix S_B has rank at most K−1.)
What assumption does LDA make that QDA does not? (Answer: LDA assumes all classes have the same covariance matrix (Σ). QDA allows each class to have its own covariance matrix Σₖ, leading to quadratic decision boundaries.)
LDA maximizes: (Answer: The Fisher criterion J(w) = wᵀS_B w / wᵀS_W w — the ratio of between-class variance to within-class variance in the projected space.)
When would you choose PCA over LDA for preprocessing? (Answer: When you have no class labels (unsupervised setting), or when you want to preserve general variance for non-classification tasks like compression or anomaly detection.)
LDA assumes Gaussian distributions. What happens when this assumption is violated? (Answer: The linear decision boundary may be suboptimal — non-linear classifiers (RBF SVM, neural networks) might outperform LDA. Kernel LDA can handle non-Gaussian data.)

LDA is directly related to the concept of embedding separation in LLMs: fine-tuning objectives often try to increase between-class separation in the embedding space while keeping within-class embeddings compact — the same principle as Fisher's criterion.

Definition

Real-life analogy: Shining a flashlight

Fisher criterion — what LDA maximizes

Fisher criterion: maximize the ratio of between-class scatter (S_B) to within-class scatter (S_W) along the projection direction w. S_B = Σₖ nₖ(μₖ−μ)(μₖ−μ)ᵀ. S_W = ΣₖΣᵢ∈class_k (xᵢ−μₖ)(xᵢ−μₖ)ᵀ. Optimal w is the eigenvector of S_W⁻¹S_B with the largest eigenvalue.

LDA for classification and dimensionality reduction

import numpy as np
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

iris = load_iris()
X, y = iris.data, iris.target   # 4 features, 3 classes

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42)

# ── LDA as classifier ──
lda_clf = LinearDiscriminantAnalysis()
lda_clf.fit(X_train, y_train)
print(f"LDA accuracy: {accuracy_score(y_test, lda_clf.predict(X_test)):.3f}")

# ── LDA as dimensionality reduction (K-1 components for K classes) ──
# 3 classes → max 2 LDA components
lda_2d = LinearDiscriminantAnalysis(n_components=2)
X_train_2d = lda_2d.fit_transform(X_train, y_train)
X_test_2d  = lda_2d.transform(X_test)
print(f"Shape: {X_train_2d.shape}")   # (120, 2) — 4D → 2D

# Explained variance ratio
print(f"Explained variance: {lda_2d.explained_variance_ratio_}")

# Compare with PCA (unsupervised)
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
X_train_pca = pca.fit_transform(X_train)
print(f"PCA explained variance: {pca.explained_variance_ratio_}")
# LDA finds better separating directions than PCA for classification tasks

Property	LDA	PCA
Supervision	Supervised (uses class labels)	Unsupervised (no labels needed)
Objective	Maximize class separation	Maximize variance
Max components	K-1 (K = number of classes)	min(n, p) components
Assumption	Gaussian classes, equal covariance	None (linear projections)
Best for	Classification + dimensionality reduction	Compression, visualization, pre-processing

QDA — Quadratic Discriminant Analysis

LDA assumes all classes share the same covariance matrix (Σ). QDA (Quadratic Discriminant Analysis) relaxes this — each class k has its own covariance matrix Σₖ. This creates quadratic decision boundaries (curved). QDA has more parameters and needs more data. LDA is a special case of QDA when all Σₖ are equal.

Practice questions (GATE-style)

LDA for a 5-class problem can produce at most how many discriminant components? (Answer: K−1 = 4 components. The between-class scatter matrix S_B has rank at most K−1.)
What assumption does LDA make that QDA does not? (Answer: LDA assumes all classes have the same covariance matrix (Σ). QDA allows each class to have its own covariance matrix Σₖ, leading to quadratic decision boundaries.)
LDA maximizes: (Answer: The Fisher criterion J(w) = wᵀS_B w / wᵀS_W w — the ratio of between-class variance to within-class variance in the projected space.)
When would you choose PCA over LDA for preprocessing? (Answer: When you have no class labels (unsupervised setting), or when you want to preserve general variance for non-classification tasks like compression or anomaly detection.)
LDA assumes Gaussian distributions. What happens when this assumption is violated? (Answer: The linear decision boundary may be suboptimal — non-linear classifiers (RBF SVM, neural networks) might outperform LDA. Kernel LDA can handle non-Gaussian data.)

On LumiChats

Try it free

Linear Discriminant Analysis (LDA)

Real-life analogy: Shining a flashlight

Fisher criterion — what LDA maximizes

Practice questions (GATE-style)

Linear Discriminant Analysis (LDA)

Real-life analogy: Shining a flashlight

Fisher criterion — what LDA maximizes

Practice questions (GATE-style)

Practice what you just learned

Related Terms