What is batch vs Online learning?

Online vs Batch Learning & Parametric vs Non-Parametric Models: Batch vs Online learning. Learn more in the LumiChats AI Glossary at https://lumichats.com/glossary/online-batch-learning

What is practice questions?

Online vs Batch Learning & Parametric vs Non-Parametric Models: Practice questions. Learn more in the LumiChats AI Glossary at https://lumichats.com/glossary/online-batch-learning

Online vs Batch Learning & Parametric vs Non-Parametric Models

Batch learning trains a model on the entire dataset at once — the model is static after training. Online learning updates the model incrementally as each new example arrives, enabling adaptation to changing data distributions without full retraining. Parametric models (linear regression, neural networks) represent knowledge in a fixed set of parameters learned during training. Non-parametric models (KNN, kernel SVM, decision trees) do not fix the model structure in advance — complexity can grow with data. These distinctions drive fundamental architecture choices in ML systems.

How models learn — all at once or continuously — and how they represent knowledge.

Category: Machine Learning

Batch vs Online learning

Property	Batch Learning	Online Learning
Training data	Entire dataset at once	One example (or mini-batch) at a time
Model update	Full retrain on new data	Incremental update with each new example
Memory	Needs all data in memory	O(1) memory — only current example needed
Adaptation	Static after training — cannot adapt	Adapts continuously to new patterns
Compute	Expensive upfront, cheap inference	Cheap updates, runs continuously
Instability	Stable — learns from full distribution	Can drift if distribution changes rapidly
Use cases	Image classifiers, LLMs, offline models	Fraud detection, ad click prediction, IoT
Examples	Batch gradient descent, sklearn fit()	SGD, river library, Kafka-based systems

from sklearn.linear_model import SGDClassifier, PassiveAggressiveClassifier
from sklearn.naive_bayes import MultinomialNB
from sklearn.preprocessing import StandardScaler
import numpy as np

# Online learning: partial_fit() updates model with each new batch
# Simulating a data stream
np.random.seed(42)
n_total = 10000
batch_size = 100
n_features = 10

# SGD Classifier supports online learning via partial_fit
model = SGDClassifier(loss='log_loss', learning_rate='optimal', random_state=42)
scaler = StandardScaler()

# Simulate stream processing
accuracies = []
for batch_start in range(0, n_total, batch_size):
    # Simulate incoming batch of data
    X_batch = np.random.randn(batch_size, n_features)
    y_batch = (X_batch[:, 0] + X_batch[:, 1] > 0).astype(int)  # Concept

    # Must provide all classes on first call
    classes = np.array([0, 1])

    if batch_start == 0:
        X_scaled = scaler.fit_transform(X_batch)
        model.partial_fit(X_scaled, y_batch, classes=classes)
    else:
        X_scaled = scaler.transform(X_batch)  # Use fitted scaler
        model.partial_fit(X_scaled, y_batch)  # Incremental update

    if batch_start % 1000 == 0:
        acc = model.score(X_scaled, y_batch)
        accuracies.append((batch_start, acc))
        print(f"Batch {batch_start}: accuracy = {acc:.3f}")

# Concept drift simulation: data distribution shifts at batch 5000
# Online model adapts; batch model trained at start would degrade

Parametric vs Non-parametric models

Parametric models assume a specific functional form for the mapping function and learn a fixed, finite set of parameters. Once trained, the training data can be discarded. Examples: linear regression (parameters = β₀, β₁, ..., βₙ), logistic regression, neural networks. Non-parametric models do not fix the functional form — the model complexity can grow with the data. Some store the training data itself. Examples: KNN (stores all training data), kernel SVM (complexity grows with support vectors), decision trees (depth not fixed).

Property	Parametric	Non-Parametric
Model complexity	Fixed regardless of data size	Can grow with data size
Training data	Can discard after training	Often kept (KNN, kernel SVM)
Memory at inference	Low (just parameters)	High (stores training data)
Assumptions	Strong (assumes functional form)	Fewer (more flexible)
Data needed	Less data needed if assumptions hold	More data needed for good fit
Examples	Linear/logistic regression, NN, Naive Bayes	KNN, kernel SVM, decision trees, random forests, GP

Why neural networks are parametric despite being very flexible: A neural network with 1B parameters is still parametric — it has a fixed number of parameters determined at architecture design time. The architecture (number of layers, neurons) is fixed; only the parameter values are learned from data. Non-parametric means the number of "parameters" can grow with training data — a KNN model with 1M training points effectively has 1M "parameters" (the training examples themselves).

Practice questions

A fraud detection system needs to adapt to new fraud patterns daily without full retraining. Should it use batch or online learning? (Answer: Online learning — fraud patterns evolve continuously. Online learning (SGD, Passive-Aggressive classifier) updates the model with each new transaction, adapting to concept drift without expensive full retraining.)
Why is KNN considered non-parametric? (Answer: KNN has no fixed parameters learned during training — it stores the entire training set. The "model" IS the training data. Model complexity grows linearly with number of training examples (more data = larger model).)
What is concept drift and how does online learning handle it? (Answer: Concept drift = the statistical properties of the target variable change over time (e.g., fraud patterns change as criminals adapt). Online learning continuously updates the model with recent data, giving higher weight to recent examples, allowing adaptation to drift.)
Linear regression has p+1 parameters for p features. Is this parametric or non-parametric? (Answer: Parametric — the number of parameters (β₀, β₁, ..., βₚ) is fixed at p+1 regardless of how many training examples you have.)
What is the "curse of dimensionality" and why does it affect non-parametric models more? (Answer: In high dimensions, all points become equidistant — nearest neighbors are no longer meaningfully close. Non-parametric models like KNN rely on distance in feature space, so they degrade badly in high dimensions. Parametric models encode structure in parameters rather than distance, handling high dimensions better.)

Modern LLMs like Claude use batch learning on massive corpora, then online-style RLHF updates. Understanding this distinction helps explain why LLMs have a knowledge cutoff date (batch training) and why real-time adaptation requires explicit fine-tuning cycles.

Property

Batch Learning

Online Learning

Training data

Entire dataset at once

One example (or mini-batch) at a time

Model update

Full retrain on new data

Incremental update with each new example

Memory

Needs all data in memory

O(1) memory — only current example needed

Adaptation

Static after training — cannot adapt

Adapts continuously to new patterns

Compute

Expensive upfront, cheap inference

Cheap updates, runs continuously

Instability

Stable — learns from full distribution

Can drift if distribution changes rapidly

Use cases

Image classifiers, LLMs, offline models

Fraud detection, ad click prediction, IoT

Examples

Batch gradient descent, sklearn fit()

SGD, river library, Kafka-based systems

from sklearn.linear_model import SGDClassifier, PassiveAggressiveClassifier from sklearn.naive_bayes import MultinomialNB from sklearn.preprocessing import StandardScaler import numpy as np # Online learning: partial_fit() updates model with each new batch # Simulating a data stream np.random.seed(42) n_total = 10000 batch_size = 100 n_features = 10 # SGD Classifier supports online learning via partial_fit model = SGDClassifier(loss='log_loss', learning_rate='optimal', random_state=42) scaler = StandardScaler() # Simulate stream processing accuracies = [] for batch_start in range(0, n_total, batch_size): # Simulate incoming batch of data X_batch = np.random.randn(batch_size, n_features) y_batch = (X_batch[:, 0] + X_batch[:, 1] > 0).astype(int) # Concept # Must provide all classes on first call classes = np.array([0, 1]) if batch_start == 0: X_scaled = scaler.fit_transform(X_batch) model.partial_fit(X_scaled, y_batch, classes=classes) else: X_scaled = scaler.transform(X_batch) # Use fitted scaler model.partial_fit(X_scaled, y_batch) # Incremental update if batch_start % 1000 == 0: acc = model.score(X_scaled, y_batch) accuracies.append((batch_start, acc)) print(f"Batch {batch_start}: accuracy = {acc:.3f}") # Concept drift simulation: data distribution shifts at batch 5000 # Online model adapts; batch model trained at start would degrade

Property

Parametric

Non-Parametric

Model complexity

Fixed regardless of data size

Can grow with data size

Training data

Can discard after training

Often kept (KNN, kernel SVM)

Memory at inference

Low (just parameters)

High (stores training data)

Assumptions

Strong (assumes functional form)

Fewer (more flexible)

Data needed

Less data needed if assumptions hold

More data needed for good fit

Examples

Linear/logistic regression, NN, Naive Bayes

KNN, kernel SVM, decision trees, random forests, GP

Online vs Batch Learning & Parametric vs Non-Parametric Models

Batch vs Online learning

Parametric vs Non-parametric models

Practice questions

Online vs Batch Learning & Parametric vs Non-Parametric Models

Batch vs Online learning

Parametric vs Non-parametric models

Practice questions

Practice what you just learned

Related Terms