Glossary/Support Vector Machine (SVM)
Machine Learning

Support Vector Machine (SVM)

Finding the widest possible boundary between classes — with or without kernels.


Definition

Support Vector Machine (SVM) is a powerful supervised learning algorithm that finds the optimal hyperplane separating two classes by maximising the margin — the gap between the hyperplane and the nearest training points (support vectors). For non-linearly separable data, the kernel trick implicitly maps data to higher dimensions where linear separation is possible. SVMs are effective in high-dimensional spaces, memory-efficient, and robust against overfitting. One of the highest-weightage GATE DS&AI classification topics.

Real-life analogy: The widest road between two neighbourhoods

Imagine two housing estates with a road between them. You want to build the widest possible road that still clearly separates the two estates. The edges of the road touch the nearest houses from each estate — these are the support vectors. SVM finds exactly this: the widest margin hyperplane. A wider road means drivers (new data points) can be misclassified less easily — better generalisation.

Hard margin SVM — linearly separable case

SVM primal form: maximise 2/||w|| (or equivalently minimise (1/2)||w||²) subject to all points being correctly classified with margin ≥ 1. The hyperplane is wᵀx + b = 0; support vectors satisfy yᵢ(wᵀxᵢ + b) = 1 exactly.

Support vectors are the data points that lie exactly on the margin boundaries. All other points — even if you remove them — do not change the decision boundary. This is a key property: the SVM decision is determined by a small subset of training examples.

Soft margin SVM — the C parameter

Soft margin SVM allows some misclassifications (slack variables ξᵢ ≥ 0). C controls the trade-off: large C = hard margin (low tolerance for misclassification, may overfit). Small C = wide margin (allows more misclassifications, better generalisation).

C valueMargin widthTraining errorsRisk
Very large (e.g., 1000)NarrowVery fewOverfitting — memorises training data
Medium (e.g., 1)ModerateSome allowedGood generalisation — typical default
Very small (e.g., 0.001)WideMany allowedUnderfitting — misclassifies training data

Kernel trick — non-linear SVM

Most real-world data is not linearly separable. The kernel trick computes the dot product in a high-dimensional feature space without explicitly transforming the data — saving enormous computation.

KernelFormulaWhen to use
LinearK(x,x') = xᵀx'Already linearly separable; high-dimensional text data
PolynomialK(x,x') = (xᵀx' + c)^dPolynomial decision boundaries; image recognition
RBF (Gaussian)K(x,x') = exp(−γ||x−x'||²)General-purpose; default choice; creates circular boundaries
SigmoidK(x,x') = tanh(αxᵀx' + c)Similar to neural network; rarely used in practice

SVM with RBF kernel and hyperparameter tuning

import numpy as np
from sklearn.svm import SVC
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report

# Non-linearly separable data (two interleaved half-moons)
X, y = make_moons(n_samples=300, noise=0.2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42)

scaler  = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test  = scaler.transform(X_test)

# Grid search over C and gamma
param_grid = {
    'C':     [0.1, 1, 10, 100],
    'gamma': [0.001, 0.01, 0.1, 1],
    'kernel': ['rbf']
}
grid = GridSearchCV(SVC(), param_grid, cv=5, scoring='accuracy')
grid.fit(X_train, y_train)

print(f"Best params: {grid.best_params_}")
print(f"Best CV score: {grid.best_score_:.3f}")
print(classification_report(y_test, grid.predict(X_test)))

# Visualise decision boundary (key insight)
best_svm = grid.best_estimator_
support_vecs = best_svm.support_vectors_
print(f"Number of support vectors: {len(support_vecs)}")

GATE insight: SVM vs Logistic Regression

Both SVM and logistic regression find linear decision boundaries. Key differences: SVM only cares about support vectors (the margin); logistic regression uses ALL data points in the loss. SVM output is a class label (hard decision); logistic regression outputs probabilities. SVM with kernel handles non-linear boundaries; logistic regression requires feature engineering for non-linearity. SVM scales better to high-dimensional data (text, genomics).

Practice questions (GATE-style)

  1. What are support vectors in SVM? (Answer: The training data points closest to the decision boundary — they lie on the margin boundaries yᵢ(wᵀxᵢ + b) = 1. The decision boundary depends only on these points; removing all other training points leaves the boundary unchanged.)
  2. SVM with C=∞ is equivalent to: (Answer: Hard margin SVM — zero tolerance for misclassification. Works only when data is linearly separable; fails with noisy or overlapping data.)
  3. The RBF kernel parameter γ controls: (Answer: The influence radius of each support vector. High γ = small radius, complex wiggly boundary (overfitting). Low γ = large radius, smooth boundary (underfitting). Must be tuned via cross-validation.)
  4. Why must features be scaled before using SVM? (Answer: The margin 2/||w|| depends on the scale of w, which depends on feature magnitudes. Unscaled features with large ranges dominate the margin calculation, giving those features too much influence.)
  5. For a dataset with 1000 features and 500 samples, which kernel SVM would you prefer? (Answer: Linear kernel SVM — in high-dimensional spaces, data is often already linearly separable. RBF kernel tends to overfit with p >> n. Linear SVM also scales better.)

On LumiChats

SVM intuition (maximising margin, kernel trick) directly connects to modern deep learning: neural network training also involves finding good decision boundaries via gradient descent. Kernels inspired attention mechanisms — transformers can be viewed as kernel machines over token sequences.

Try it free

Try LumiChats for ₹69

39+ AI models. Study Mode with page-locked answers. Agent Mode with code execution. Pay only on days you use it.

Get Started — ₹69/day

Related Terms

4 terms