What is multiple linear regression and the matrix form?

Linear Regression — Simple & Multiple: Multiple linear regression and the matrix form. Learn more in the LumiChats AI Glossary at https://lumichats.com/glossary/linear-regression

What is assumptions of linear regression (GATE tested)?

Linear Regression — Simple & Multiple: Assumptions of linear regression (GATE tested). Learn more in the LumiChats AI Glossary at https://lumichats.com/glossary/linear-regression

What is practice questions (GATE-style)?

Linear Regression — Simple & Multiple: Practice questions (GATE-style). Learn more in the LumiChats AI Glossary at https://lumichats.com/glossary/linear-regression

Linear Regression — Simple & Multiple

Linear regression is the foundational supervised learning algorithm for regression tasks — predicting a continuous output value. Simple linear regression fits one input feature to the output. Multiple linear regression extends this to many features. The model learns weights (coefficients) that minimize the Mean Squared Error between predictions and true values. Linear regression is the most tested regression topic in GATE DS&AI — expect numerical questions on OLS, residuals, R-squared, and assumptions.

Predicting a number by fitting a straight line through data points.

Category: Machine Learning

Real-life analogy: The house price estimator

Imagine predicting a house price. You observe that every extra square metre of area adds roughly ₹5,000 to the price. Simple linear regression finds this exact rate automatically from historical data. Multiple linear regression adds more factors: bedrooms, location, age. The algorithm finds the weight of each factor that best explains the price variation — without you specifying the weights manually.

Simple linear regression — the equation

\hat{y} = \beta_0 + \beta_1 x \quad \text{where} \quad \beta_1 = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^n (x_i - \bar{x})^2}, \quad \beta_0 = \bar{y} - \beta_1 \bar{x}

The residual for data point i is e_i = y_i − ŷ_i. OLS (Ordinary Least Squares) minimizes ∑eᵢ². The solution is exact — no iterative training needed for linear regression unlike neural networks.

import numpy as np
from sklearn.linear_model import LinearRegression

# Sample data: study hours vs exam score
X = np.array([1, 2, 3, 4, 5, 6, 7, 8]).reshape(-1, 1)
y = np.array([52, 58, 62, 68, 73, 79, 84, 90])

# ── From scratch (OLS formula) ──
x_mean, y_mean = X.mean(), y.mean()
beta1 = np.sum((X.flatten() - x_mean) * (y - y_mean)) / np.sum((X.flatten() - x_mean)**2)
beta0 = y_mean - beta1 * x_mean
print(f"Scratch  → β₀={beta0:.2f}, β₁={beta1:.2f}")

# ── sklearn ──
model = LinearRegression().fit(X, y)
print(f"sklearn  → β₀={model.intercept_:.2f}, β₁={model.coef_[0]:.2f}")

# Predict for 9 hours of study
print(f"Predicted score for 9 hrs: {beta0 + beta1 * 9:.1f}")

# R-squared: proportion of variance explained
y_pred = beta0 + beta1 * X.flatten()
ss_res = np.sum((y - y_pred)**2)
ss_tot = np.sum((y - y_mean)**2)
r2 = 1 - ss_res / ss_tot
print(f"R² = {r2:.4f}")   # → ~0.998

Multiple linear regression and the matrix form

\hat{\mathbf{y}} = X\boldsymbol{\beta} \quad \Rightarrow \quad \boldsymbol{\beta} = (X^T X)^{-1} X^T \mathbf{y}

from sklearn.datasets import fetch_california_housing
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import StandardScaler
import numpy as np

# Load dataset
data = fetch_california_housing(as_frame=True)
X, y = data.data, data.target        # 8 features, target = median house value

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42)

# Scale features (important for comparing coefficients)
scaler  = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test  = scaler.transform(X_test)

# Fit model
model = LinearRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print(f"RMSE: {np.sqrt(mean_squared_error(y_test, y_pred)):.3f}")
print(f"R²:   {r2_score(y_test, y_pred):.3f}")

# Feature importance (standardized coefficients)
for feat, coef in zip(data.feature_names, model.coef_):
    print(f"  {feat:<12} {coef:+.4f}")

Metric	Formula	Perfect value	Interpretation
MSE	(1/n)Σ(yᵢ−ŷᵢ)²	0	Average squared error — sensitive to outliers
RMSE	√MSE	0	Same units as y — easier to interpret
MAE	(1/n)Σ\|yᵢ−ŷᵢ\|	0	Average absolute error — robust to outliers
R²	1 − SS_res/SS_tot	1	Proportion of variance explained (0–1)
Adjusted R²	1−(1−R²)(n−1)/(n−p−1)	1	Penalizes adding useless features

Assumptions of linear regression (GATE tested)

Linearity: The relationship between X and y is linear. Violated → use polynomial regression or non-linear models.
Independence: Observations are independent. Violated in time series → use autoregression.
Homoscedasticity: Residual variance is constant across all values of X. Violated (heteroscedasticity) → use weighted least squares.
Normality of residuals: Residuals follow N(0, σ²). Needed for statistical tests (p-values, confidence intervals), not for point predictions.
No multicollinearity: Features are not perfectly correlated. Violated → (XᵀX)⁻¹ is unstable or singular. Solution: Ridge regression (L2 regularization) or remove correlated features.

GATE key fact: R² vs adjusted R²: R² always increases when you add more features — even irrelevant ones. Adjusted R² penalizes for the number of predictors (p): Adj R² = 1 − (1−R²)(n−1)/(n−p−1). Always use Adjusted R² when comparing models with different numbers of features. This is a favorite GATE MCQ.

Practice questions (GATE-style)

In simple linear regression, what does β₁ represent geometrically? (Answer: The slope of the best-fit line — the change in y per unit change in x.)
R² = 0.85. What does this mean? (Answer: 85% of the variance in y is explained by the model. The remaining 15% is due to factors not captured by the features.)
If two features X₁ and X₂ are perfectly correlated (r=1), what problem arises in OLS? (Answer: XᵀX becomes singular (non-invertible) — the OLS solution does not exist. This is multicollinearity. Solution: Remove one feature or use Ridge regression.)
Adding 5 random noise features to a model will do what to R²? (Answer: R² will increase or stay the same — it never decreases with added features, even irrelevant ones. Adjusted R² will decrease, correctly penalizing the spurious features.)
The residual sum of squares (RSS) is 120 and the total sum of squares (TSS) is 400. Compute R². (Answer: R² = 1 − 120/400 = 1 − 0.30 = 0.70)

LumiChats can solve linear regression problems step-by-step — paste your dataset as a table and ask for the OLS solution, residual analysis, or R² interpretation with full working shown.

Definition

Real-life analogy: The house price estimator

Simple linear regression — the equation

OLS closed-form solution for simple linear regression. β₁ = slope (how much y changes per unit x). β₀ = intercept (y value when x=0). These are derived by minimizing the sum of squared residuals.

Simple linear regression from scratch + sklearn comparison

import numpy as np
from sklearn.linear_model import LinearRegression

# Sample data: study hours vs exam score
X = np.array([1, 2, 3, 4, 5, 6, 7, 8]).reshape(-1, 1)
y = np.array([52, 58, 62, 68, 73, 79, 84, 90])

# ── From scratch (OLS formula) ──
x_mean, y_mean = X.mean(), y.mean()
beta1 = np.sum((X.flatten() - x_mean) * (y - y_mean)) / np.sum((X.flatten() - x_mean)**2)
beta0 = y_mean - beta1 * x_mean
print(f"Scratch  → β₀={beta0:.2f}, β₁={beta1:.2f}")

# ── sklearn ──
model = LinearRegression().fit(X, y)
print(f"sklearn  → β₀={model.intercept_:.2f}, β₁={model.coef_[0]:.2f}")

# Predict for 9 hours of study
print(f"Predicted score for 9 hrs: {beta0 + beta1 * 9:.1f}")

# R-squared: proportion of variance explained
y_pred = beta0 + beta1 * X.flatten()
ss_res = np.sum((y - y_pred)**2)
ss_tot = np.sum((y - y_mean)**2)
r2 = 1 - ss_res / ss_tot
print(f"R² = {r2:.4f}")   # → ~0.998

Multiple linear regression and the matrix form

Matrix form OLS solution. X is the design matrix (n×p+1 with a column of 1s for intercept). β is the (p+1)-dimensional weight vector. XᵀX must be invertible — if features are perfectly correlated (multicollinearity), this fails. Ridge regression solves this.

Multiple linear regression on Boston Housing data

from sklearn.datasets import fetch_california_housing
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import StandardScaler
import numpy as np

# Load dataset
data = fetch_california_housing(as_frame=True)
X, y = data.data, data.target        # 8 features, target = median house value

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42)

# Scale features (important for comparing coefficients)
scaler  = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test  = scaler.transform(X_test)

# Fit model
model = LinearRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print(f"RMSE: {np.sqrt(mean_squared_error(y_test, y_pred)):.3f}")
print(f"R²:   {r2_score(y_test, y_pred):.3f}")

# Feature importance (standardized coefficients)
for feat, coef in zip(data.feature_names, model.coef_):
    print(f"  {feat:<12} {coef:+.4f}")

Metric	Formula	Perfect value	Interpretation
MSE	(1/n)Σ(yᵢ−ŷᵢ)²	0	Average squared error — sensitive to outliers
RMSE	√MSE	0	Same units as y — easier to interpret
MAE	(1/n)Σ\|yᵢ−ŷᵢ\|	0	Average absolute error — robust to outliers
R²	1 − SS_res/SS_tot	1	Proportion of variance explained (0–1)
Adjusted R²	1−(1−R²)(n−1)/(n−p−1)	1	Penalizes adding useless features

Assumptions of linear regression (GATE tested)

Linearity: The relationship between X and y is linear. Violated → use polynomial regression or non-linear models.
Independence: Observations are independent. Violated in time series → use autoregression.
Homoscedasticity: Residual variance is constant across all values of X. Violated (heteroscedasticity) → use weighted least squares.
Normality of residuals: Residuals follow N(0, σ²). Needed for statistical tests (p-values, confidence intervals), not for point predictions.
No multicollinearity: Features are not perfectly correlated. Violated → (XᵀX)⁻¹ is unstable or singular. Solution: Ridge regression (L2 regularization) or remove correlated features.

GATE key fact: R² vs adjusted R²

R² always increases when you add more features — even irrelevant ones. Adjusted R² penalizes for the number of predictors (p): Adj R² = 1 − (1−R²)(n−1)/(n−p−1). Always use Adjusted R² when comparing models with different numbers of features. This is a favorite GATE MCQ.

Practice questions (GATE-style)

In simple linear regression, what does β₁ represent geometrically? (Answer: The slope of the best-fit line — the change in y per unit change in x.)
R² = 0.85. What does this mean? (Answer: 85% of the variance in y is explained by the model. The remaining 15% is due to factors not captured by the features.)
If two features X₁ and X₂ are perfectly correlated (r=1), what problem arises in OLS? (Answer: XᵀX becomes singular (non-invertible) — the OLS solution does not exist. This is multicollinearity. Solution: Remove one feature or use Ridge regression.)
Adding 5 random noise features to a model will do what to R²? (Answer: R² will increase or stay the same — it never decreases with added features, even irrelevant ones. Adjusted R² will decrease, correctly penalizing the spurious features.)
The residual sum of squares (RSS) is 120 and the total sum of squares (TSS) is 400. Compute R². (Answer: R² = 1 − 120/400 = 1 − 0.30 = 0.70)

On LumiChats

LumiChats can solve linear regression problems step-by-step — paste your dataset as a table and ask for the OLS solution, residual analysis, or R² interpretation with full working shown.

Try it free

Linear Regression — Simple & Multiple

Real-life analogy: The house price estimator

Simple linear regression — the equation

Multiple linear regression and the matrix form

Assumptions of linear regression (GATE tested)

Practice questions (GATE-style)

Linear Regression — Simple & Multiple

Real-life analogy: The house price estimator

Simple linear regression — the equation

Multiple linear regression and the matrix form

Assumptions of linear regression (GATE tested)

Practice questions (GATE-style)

Practice what you just learned

Related Terms