Glossary/Linear Regression — Simple & Multiple
Machine Learning

Linear Regression — Simple & Multiple

Predicting a number by fitting a straight line through data points.


Definition

Linear regression is the foundational supervised learning algorithm for regression tasks — predicting a continuous output value. Simple linear regression fits one input feature to the output. Multiple linear regression extends this to many features. The model learns weights (coefficients) that minimise the Mean Squared Error between predictions and true values. Linear regression is the most tested regression topic in GATE DS&AI — expect numerical questions on OLS, residuals, R-squared, and assumptions.

Real-life analogy: The house price estimator

Imagine predicting a house price. You observe that every extra square metre of area adds roughly ₹5,000 to the price. Simple linear regression finds this exact rate automatically from historical data. Multiple linear regression adds more factors: bedrooms, location, age. The algorithm finds the weight of each factor that best explains the price variation — without you specifying the weights manually.

Simple linear regression — the equation

OLS closed-form solution for simple linear regression. β₁ = slope (how much y changes per unit x). β₀ = intercept (y value when x=0). These are derived by minimising the sum of squared residuals.

The residual for data point i is e_i = y_i − ŷ_i. OLS (Ordinary Least Squares) minimises ∑eᵢ². The solution is exact — no iterative training needed for linear regression unlike neural networks.

Simple linear regression from scratch + sklearn comparison

import numpy as np
from sklearn.linear_model import LinearRegression

# Sample data: study hours vs exam score
X = np.array([1, 2, 3, 4, 5, 6, 7, 8]).reshape(-1, 1)
y = np.array([52, 58, 62, 68, 73, 79, 84, 90])

# ── From scratch (OLS formula) ──
x_mean, y_mean = X.mean(), y.mean()
beta1 = np.sum((X.flatten() - x_mean) * (y - y_mean)) / np.sum((X.flatten() - x_mean)**2)
beta0 = y_mean - beta1 * x_mean
print(f"Scratch  → β₀={beta0:.2f}, β₁={beta1:.2f}")

# ── sklearn ──
model = LinearRegression().fit(X, y)
print(f"sklearn  → β₀={model.intercept_:.2f}, β₁={model.coef_[0]:.2f}")

# Predict for 9 hours of study
print(f"Predicted score for 9 hrs: {beta0 + beta1 * 9:.1f}")

# R-squared: proportion of variance explained
y_pred = beta0 + beta1 * X.flatten()
ss_res = np.sum((y - y_pred)**2)
ss_tot = np.sum((y - y_mean)**2)
r2 = 1 - ss_res / ss_tot
print(f"R² = {r2:.4f}")   # → ~0.998

Multiple linear regression and the matrix form

Matrix form OLS solution. X is the design matrix (n×p+1 with a column of 1s for intercept). β is the (p+1)-dimensional weight vector. XᵀX must be invertible — if features are perfectly correlated (multicollinearity), this fails. Ridge regression solves this.

Multiple linear regression on Boston Housing data

from sklearn.datasets import fetch_california_housing
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import StandardScaler
import numpy as np

# Load dataset
data = fetch_california_housing(as_frame=True)
X, y = data.data, data.target        # 8 features, target = median house value

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42)

# Scale features (important for comparing coefficients)
scaler  = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test  = scaler.transform(X_test)

# Fit model
model = LinearRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print(f"RMSE: {np.sqrt(mean_squared_error(y_test, y_pred)):.3f}")
print(f"R²:   {r2_score(y_test, y_pred):.3f}")

# Feature importance (standardised coefficients)
for feat, coef in zip(data.feature_names, model.coef_):
    print(f"  {feat:<12} {coef:+.4f}")
MetricFormulaPerfect valueInterpretation
MSE(1/n)Σ(yᵢ−ŷᵢ)²0Average squared error — sensitive to outliers
RMSE√MSE0Same units as y — easier to interpret
MAE(1/n)Σ|yᵢ−ŷᵢ|0Average absolute error — robust to outliers
1 − SS_res/SS_tot1Proportion of variance explained (0–1)
Adjusted R²1−(1−R²)(n−1)/(n−p−1)1Penalises adding useless features

Assumptions of linear regression (GATE tested)

  • Linearity: The relationship between X and y is linear. Violated → use polynomial regression or non-linear models.
  • Independence: Observations are independent. Violated in time series → use autoregression.
  • Homoscedasticity: Residual variance is constant across all values of X. Violated (heteroscedasticity) → use weighted least squares.
  • Normality of residuals: Residuals follow N(0, σ²). Needed for statistical tests (p-values, confidence intervals), not for point predictions.
  • No multicollinearity: Features are not perfectly correlated. Violated → (XᵀX)⁻¹ is unstable or singular. Solution: Ridge regression (L2 regularisation) or remove correlated features.

GATE key fact: R² vs adjusted R²

R² always increases when you add more features — even irrelevant ones. Adjusted R² penalises for the number of predictors (p): Adj R² = 1 − (1−R²)(n−1)/(n−p−1). Always use Adjusted R² when comparing models with different numbers of features. This is a favourite GATE MCQ.

Practice questions (GATE-style)

  1. In simple linear regression, what does β₁ represent geometrically? (Answer: The slope of the best-fit line — the change in y per unit change in x.)
  2. R² = 0.85. What does this mean? (Answer: 85% of the variance in y is explained by the model. The remaining 15% is due to factors not captured by the features.)
  3. If two features X₁ and X₂ are perfectly correlated (r=1), what problem arises in OLS? (Answer: XᵀX becomes singular (non-invertible) — the OLS solution does not exist. This is multicollinearity. Solution: Remove one feature or use Ridge regression.)
  4. Adding 5 random noise features to a model will do what to R²? (Answer: R² will increase or stay the same — it never decreases with added features, even irrelevant ones. Adjusted R² will decrease, correctly penalising the spurious features.)
  5. The residual sum of squares (RSS) is 120 and the total sum of squares (TSS) is 400. Compute R². (Answer: R² = 1 − 120/400 = 1 − 0.30 = 0.70)

On LumiChats

LumiChats can solve linear regression problems step-by-step — paste your dataset as a table and ask for the OLS solution, residual analysis, or R² interpretation with full working shown.

Try it free

Try LumiChats for ₹69

39+ AI models. Study Mode with page-locked answers. Agent Mode with code execution. Pay only on days you use it.

Get Started — ₹69/day

Related Terms

4 terms