What is practice questions?

Semi-Supervised & Self-Supervised Learning: Practice questions. Learn more in the LumiChats AI Glossary at https://lumichats.com/glossary/semi-supervised-learning

Semi-Supervised & Self-Supervised Learning

Semi-supervised learning uses a small amount of labeled data combined with a large amount of unlabeled data for training. Since labeling is expensive and time-consuming, leveraging abundant unlabeled data dramatically improves model performance. Self-supervised learning goes further — it generates its own supervision signal from the raw data structure itself, requiring no human labels at all. Self-supervised learning powers GPT, BERT, DALL-E, and most modern foundation models. Understanding these paradigms is essential for modern ML practice.

Learning from mostly unlabeled data — the bridge between supervised and unsupervised AI.

Category: Machine Learning

Real-life analogy: Learning to read with one labeled book

Semi-supervised: Imagine learning French with 10 labeled dialogues (English translation provided) and 10,000 unlabeled French texts. You use the labeled examples to build initial understanding, then let the patterns from unlabeled texts reinforce and extend your knowledge. Self-supervised: You learn by predicting the next word in every sentence — no teacher needed, the text itself is the teacher. This is exactly how GPT was trained.

Semi-supervised learning

In semi-supervised learning you have a dataset D = D_L ∪ D_U where |D_L| << |D_U|. D_L is the small labeled set, D_U is the large unlabeled set. Common techniques:

Pseudo-labeling: Train on D_L, predict labels for D_U with high confidence (e.g., > 0.95 probability), add those pseudo-labeled examples to D_L, retrain. Repeat.
Label Propagation: Build a graph where similar data points are connected. Propagate known labels through the graph to nearby unlabeled points.
Co-training: Train two models on different feature subsets. Each model labels examples for the other when confident. Models teach each other.
FixMatch / MixMatch: State-of-the-art semi-supervised methods using consistency regularization — the model should give the same prediction for a data point and its augmented version.

from sklearn.semi_supervised import LabelPropagation, LabelSpreading
from sklearn.datasets import make_classification
import numpy as np

# Create dataset: 10 labeled, 990 unlabeled
X, y_true = make_classification(n_samples=1000, n_features=20, random_state=42)
y_semi = y_true.copy()
y_semi[10:] = -1   # -1 means "unlabeled" in sklearn semi-supervised

# Label Propagation: spread labels through nearest-neighbor graph
lp = LabelPropagation(kernel='rbf', gamma=20, max_iter=1000)
lp.fit(X, y_semi)
y_pred_lp = lp.predict(X[10:])   # Predictions for unlabeled data

# Label Spreading (more robust to noise)
ls = LabelSpreading(kernel='rbf', alpha=0.2, max_iter=1000)
ls.fit(X, y_semi)
y_pred_ls = ls.predict(X[10:])

# Compare with supervised-only (only uses 10 labeled examples)
from sklearn.svm import SVC
svm_supervised = SVC().fit(X[:10], y_true[:10])
y_pred_sup = svm_supervised.predict(X[10:])

from sklearn.metrics import accuracy_score
print(f"Supervised only (10 labels):    {accuracy_score(y_true[10:], y_pred_sup):.3f}")
print(f"Label Propagation (10 + 990):   {accuracy_score(y_true[10:], y_pred_lp):.3f}")
print(f"Label Spreading (10 + 990):     {accuracy_score(y_true[10:], y_pred_ls):.3f}")
# Semi-supervised typically beats supervised-only significantly

Self-supervised learning — the engine of modern AI

Self-supervised learning creates a pretext task from the data itself — a task where the supervision signal is automatically derived from the input, requiring zero human labels. The model learns rich representations by solving the pretext task. Those representations are then fine-tuned for downstream tasks.

Domain	Self-supervised pretext task	Model trained	Downstream use
NLP (text)	Predict masked tokens (15% of words hidden)	BERT	Classification, NER, QA
NLP (text)	Predict next token (autoregressive)	GPT-4, Claude, Llama	Chat, completion, reasoning
Vision	Predict missing image patches	MAE, BEiT	Image classification, detection
Vision	Contrast similar vs dissimilar images	SimCLR, CLIP, DINO	Image search, zero-shot
Audio	Predict next audio frame	Wav2Vec 2.0, HuBERT	Speech recognition, ASR
Multimodal	Match image to its text caption	CLIP, ALIGN	Zero-shot image classification

Why self-supervised learning is revolutionary: Pre-2017: you needed 1 million labeled images to train a good vision model. Post-self-supervised: CLIP trains on 400 million image-text pairs scraped from the web (no human labels) and achieves better zero-shot image classification than supervised models trained on ImageNet. GPT-4 was trained on trillions of tokens of raw web text with no human labeling of training data (only RLHF for alignment).

Practice questions

Why is semi-supervised learning practically important? (Answer: Labeling data is expensive — medical images require radiologists, legal documents require lawyers. Unlabeled data is cheap. Semi-supervised learning lets you use abundant cheap data + small expensive labeled set to achieve near-supervised performance.)
What is a pretext task in self-supervised learning? (Answer: An automatically generated task derived from the data structure itself with no human labels. Examples: predict masked words (BERT), predict next word (GPT), predict missing image patches (MAE). The model learns representations while solving the pretext task.)
BERT uses masked language modeling. What percentage of tokens are masked? (Answer: 15% of input tokens are randomly selected — 80% replaced with [MASK], 10% replaced with random word, 10% unchanged. This prevents the model from learning to always ignore [MASK] tokens.)
What is the difference between semi-supervised and self-supervised learning? (Answer: Semi-supervised: uses small labeled set + large unlabeled set. Self-supervised: uses NO labeled data — creates supervision from data structure itself (masking, next-token prediction, contrastive pairs).)
Contrastive learning (SimCLR, CLIP) — what does "contrastive" mean? (Answer: The model learns by contrasting similar pairs (positive: same image under two augmentations) against dissimilar pairs (negative: different images). Loss pulls positives together and pushes negatives apart in embedding space — teaching meaningful similarity.)

LumiChats itself was trained with self-supervised learning (next-token prediction) on trillions of text tokens, then fine-tuned with RLHF. Understanding self-supervised learning directly explains how modern LLMs acquire their broad knowledge before specialization.

from sklearn.semi_supervised import LabelPropagation, LabelSpreading from sklearn.datasets import make_classification import numpy as np # Create dataset: 10 labeled, 990 unlabeled X, y_true = make_classification(n_samples=1000, n_features=20, random_state=42) y_semi = y_true.copy() y_semi[10:] = -1 # -1 means "unlabeled" in sklearn semi-supervised # Label Propagation: spread labels through nearest-neighbor graph lp = LabelPropagation(kernel='rbf', gamma=20, max_iter=1000) lp.fit(X, y_semi) y_pred_lp = lp.predict(X[10:]) # Predictions for unlabeled data # Label Spreading (more robust to noise) ls = LabelSpreading(kernel='rbf', alpha=0.2, max_iter=1000) ls.fit(X, y_semi) y_pred_ls = ls.predict(X[10:]) # Compare with supervised-only (only uses 10 labeled examples) from sklearn.svm import SVC svm_supervised = SVC().fit(X[:10], y_true[:10]) y_pred_sup = svm_supervised.predict(X[10:]) from sklearn.metrics import accuracy_score print(f"Supervised only (10 labels): {accuracy_score(y_true[10:], y_pred_sup):.3f}") print(f"Label Propagation (10 + 990): {accuracy_score(y_true[10:], y_pred_lp):.3f}") print(f"Label Spreading (10 + 990): {accuracy_score(y_true[10:], y_pred_ls):.3f}") # Semi-supervised typically beats supervised-only significantly

Domain

Self-supervised pretext task

Model trained

Downstream use

NLP (text)

Predict masked tokens (15% of words hidden)

BERT

Classification, NER, QA

NLP (text)

Predict next token (autoregressive)

GPT-4, Claude, Llama

Chat, completion, reasoning

Vision

Predict missing image patches

MAE, BEiT

Image classification, detection

Vision

Contrast similar vs dissimilar images

SimCLR, CLIP, DINO

Image search, zero-shot

Audio

Predict next audio frame

Wav2Vec 2.0, HuBERT

Speech recognition, ASR

Multimodal

Match image to its text caption

CLIP, ALIGN

Zero-shot image classification

Semi-Supervised & Self-Supervised Learning

Real-life analogy: Learning to read with one labeled book

Semi-supervised learning

Self-supervised learning — the engine of modern AI

Practice questions

Semi-Supervised & Self-Supervised Learning

Real-life analogy: Learning to read with one labeled book

Semi-supervised learning

Self-supervised learning — the engine of modern AI

Practice questions

Practice what you just learned

Related Terms