Glossary/Online vs Batch Learning & Parametric vs Non-Parametric Models
Machine Learning

Online vs Batch Learning & Parametric vs Non-Parametric Models

How models learn — all at once or continuously — and how they represent knowledge.


Definition

Batch learning trains a model on the entire dataset at once — the model is static after training. Online learning updates the model incrementally as each new example arrives, enabling adaptation to changing data distributions without full retraining. Parametric models (linear regression, neural networks) represent knowledge in a fixed set of parameters learned during training. Non-parametric models (KNN, kernel SVM, decision trees) do not fix the model structure in advance — complexity can grow with data. These distinctions drive fundamental architecture choices in ML systems.

Batch vs Online learning

PropertyBatch LearningOnline Learning
Training dataEntire dataset at onceOne example (or mini-batch) at a time
Model updateFull retrain on new dataIncremental update with each new example
MemoryNeeds all data in memoryO(1) memory — only current example needed
AdaptationStatic after training — cannot adaptAdapts continuously to new patterns
ComputeExpensive upfront, cheap inferenceCheap updates, runs continuously
InstabilityStable — learns from full distributionCan drift if distribution changes rapidly
Use casesImage classifiers, LLMs, offline modelsFraud detection, ad click prediction, IoT
ExamplesBatch gradient descent, sklearn fit()SGD, river library, Kafka-based systems

Online learning with incremental fitting

from sklearn.linear_model import SGDClassifier, PassiveAggressiveClassifier
from sklearn.naive_bayes import MultinomialNB
from sklearn.preprocessing import StandardScaler
import numpy as np

# Online learning: partial_fit() updates model with each new batch
# Simulating a data stream
np.random.seed(42)
n_total = 10000
batch_size = 100
n_features = 10

# SGD Classifier supports online learning via partial_fit
model = SGDClassifier(loss='log_loss', learning_rate='optimal', random_state=42)
scaler = StandardScaler()

# Simulate stream processing
accuracies = []
for batch_start in range(0, n_total, batch_size):
    # Simulate incoming batch of data
    X_batch = np.random.randn(batch_size, n_features)
    y_batch = (X_batch[:, 0] + X_batch[:, 1] > 0).astype(int)  # Concept

    # Must provide all classes on first call
    classes = np.array([0, 1])

    if batch_start == 0:
        X_scaled = scaler.fit_transform(X_batch)
        model.partial_fit(X_scaled, y_batch, classes=classes)
    else:
        X_scaled = scaler.transform(X_batch)  # Use fitted scaler
        model.partial_fit(X_scaled, y_batch)  # Incremental update

    if batch_start % 1000 == 0:
        acc = model.score(X_scaled, y_batch)
        accuracies.append((batch_start, acc))
        print(f"Batch {batch_start}: accuracy = {acc:.3f}")

# Concept drift simulation: data distribution shifts at batch 5000
# Online model adapts; batch model trained at start would degrade

Parametric vs Non-parametric models

Parametric models assume a specific functional form for the mapping function and learn a fixed, finite set of parameters. Once trained, the training data can be discarded. Examples: linear regression (parameters = β₀, β₁, ..., βₙ), logistic regression, neural networks. Non-parametric models do not fix the functional form — the model complexity can grow with the data. Some store the training data itself. Examples: KNN (stores all training data), kernel SVM (complexity grows with support vectors), decision trees (depth not fixed).

PropertyParametricNon-Parametric
Model complexityFixed regardless of data sizeCan grow with data size
Training dataCan discard after trainingOften kept (KNN, kernel SVM)
Memory at inferenceLow (just parameters)High (stores training data)
AssumptionsStrong (assumes functional form)Fewer (more flexible)
Data neededLess data needed if assumptions holdMore data needed for good fit
ExamplesLinear/logistic regression, NN, Naive BayesKNN, kernel SVM, decision trees, random forests, GP

Why neural networks are parametric despite being very flexible

A neural network with 1B parameters is still parametric — it has a fixed number of parameters determined at architecture design time. The architecture (number of layers, neurons) is fixed; only the parameter values are learned from data. Non-parametric means the number of "parameters" can grow with training data — a KNN model with 1M training points effectively has 1M "parameters" (the training examples themselves).

Practice questions

  1. A fraud detection system needs to adapt to new fraud patterns daily without full retraining. Should it use batch or online learning? (Answer: Online learning — fraud patterns evolve continuously. Online learning (SGD, Passive-Aggressive classifier) updates the model with each new transaction, adapting to concept drift without expensive full retraining.)
  2. Why is KNN considered non-parametric? (Answer: KNN has no fixed parameters learned during training — it stores the entire training set. The "model" IS the training data. Model complexity grows linearly with number of training examples (more data = larger model).)
  3. What is concept drift and how does online learning handle it? (Answer: Concept drift = the statistical properties of the target variable change over time (e.g., fraud patterns change as criminals adapt). Online learning continuously updates the model with recent data, giving higher weight to recent examples, allowing adaptation to drift.)
  4. Linear regression has p+1 parameters for p features. Is this parametric or non-parametric? (Answer: Parametric — the number of parameters (β₀, β₁, ..., βₚ) is fixed at p+1 regardless of how many training examples you have.)
  5. What is the "curse of dimensionality" and why does it affect non-parametric models more? (Answer: In high dimensions, all points become equidistant — nearest neighbours are no longer meaningfully close. Non-parametric models like KNN rely on distance in feature space, so they degrade badly in high dimensions. Parametric models encode structure in parameters rather than distance, handling high dimensions better.)

On LumiChats

Modern LLMs like Claude use batch learning on massive corpora, then online-style RLHF updates. Understanding this distinction helps explain why LLMs have a knowledge cutoff date (batch training) and why real-time adaptation requires explicit fine-tuning cycles.

Try it free

Try LumiChats for ₹69

39+ AI models. Study Mode with page-locked answers. Agent Mode with code execution. Pay only on days you use it.

Get Started — ₹69/day

Related Terms

4 terms