Glossary/In-Context Learning (ICL)
AI Fundamentals

In-Context Learning (ICL)

Teaching a model to do new tasks by showing examples inside the prompt — no training required.


Definition

In-context learning (ICL) is the ability of large language models to perform new tasks by observing examples provided within the prompt, without updating any model weights. By showing a model several input-output pairs as part of the prompt (few-shot examples), it can infer the pattern and apply it to a new input — effectively 'learning' the task from the context alone. ICL emerged as an unexpected capability of large-scale pretraining and is one of the key properties distinguishing large models from smaller ones.

How ICL works — and why it's surprising

Standard machine learning requires updating model weights through gradient descent to teach a model a new task. ICL requires no weight updates — the model reads examples at inference time and adapts its output distribution accordingly. This is counterintuitive: the weights are fixed, yet the model's behaviour changes based on what it reads in the prompt. The mechanism appears to be that large models, during pretraining, learned to recognise task structures and extrapolate patterns — so ICL is more like pattern completion than learning in the traditional sense.

Few-shot in-context learning — English to SQL translation with examples

from anthropic import Anthropic
client = Anthropic()

# In-context learning: show examples → model infers the pattern
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=200,
    messages=[{
        "role": "user",
        "content": """Translate English questions to SQL queries.

Example 1:
English: How many users signed up last month?
SQL: SELECT COUNT(*) FROM users WHERE created_at >= DATE_TRUNC('month', NOW() - INTERVAL '1 month')

Example 2:
English: What are the top 5 products by revenue?
SQL: SELECT product_name, SUM(price * quantity) as revenue FROM orders GROUP BY product_name ORDER BY revenue DESC LIMIT 5

Example 3:
English: Which users have never placed an order?
SQL: SELECT u.* FROM users u LEFT JOIN orders o ON u.id = o.user_id WHERE o.id IS NULL

Now translate:
English: What is the average order value for customers from India?
SQL:"""
    }]
)
# Without any fine-tuning, Claude infers the SQL translation pattern from 3 examples
print(response.content[0].text)
# → SELECT AVG(total_amount) FROM orders o JOIN users u ON o.user_id = u.id WHERE u.country = 'India'
Prompting modeExamples providedWhen to use
Zero-shot0 examples — direct instruction onlyWell-defined tasks; common task types the model was pretrained on
One-shot1 exampleClarifying format when zero-shot produces wrong structure
Few-shot (2–5)2–5 examplesMost ICL applications — best balance of context use vs. guidance
Many-shot (10+)10–100+ examplesComplex structured tasks; domain-specific patterns; when few-shot is inconsistent

Scaling laws for ICL — why big models are better at it

ICL capability scales sharply with model size. GPT-2 (1.5B parameters) shows minimal ICL ability — adding examples barely improves performance. GPT-3 (175B) showed surprising few-shot performance that sparked the ICL research field. Models above ~7B parameters generally show reliable ICL; models above ~70B show strong generalisation to novel task structures from minimal examples. This threshold behaviour — where ICL appears almost discontinuously at scale — is one of the canonical examples of emergent capabilities.

ICL vs fine-tuning: when to use each

ICL (via prompting) is faster, cheaper, and requires no training infrastructure. Fine-tuning updates model weights and produces more reliable, consistent behaviour for high-volume production tasks. The practical rule: start with ICL to validate that the task is learnable. If ICL achieves sufficient quality and volume doesn't justify fine-tuning costs, ship with ICL. If you need consistent quality across millions of daily requests or the task has high stakes (medical, legal, financial), fine-tune.

Practice questions

  1. Why do large models (≥7B) exhibit in-context learning while small models (≤1B) largely do not? (Answer: ICL requires the model to perform Bayesian inference within the forward pass — inferring the task distribution from examples and updating its implicit prior. This requires sufficient model capacity to form and update task representations in context. Small models lack the representational capacity to maintain and manipulate multiple task hypotheses across long contexts. The capability emerges around 7B parameters and improves significantly at 65B+.)
  2. What is the optimal number of in-context examples (k) for few-shot prompting? (Answer: Empirically: more examples generally helps up to a point. 8–32 examples is often optimal for most tasks. Too few (1–2): insufficient signal, high variance. Too many: exceeds context window, but also shows diminishing returns — the model's capability is bounded by what was learned during pretraining. For regression-style tasks or tasks with many classes, more examples help more. For simple binary tasks, 4–8 examples usually suffice.)
  3. What does it mean that ICL is a 'meta-learning' phenomenon? (Answer: ICL is learning to learn. During pretraining, the model encountered many tasks in context (documents that implicitly demonstrate tasks). It learned the meta-skill of recognising task structure from examples and applying it to new inputs. At inference, it 'meta-learns' the specific task from the prompt examples without gradient updates. Xie et al. (2022) showed ICL is equivalent to implicit Bayesian inference over a prior of latent concepts learned during pretraining.)
  4. When does ICL fail and what are the warning signs? (Answer: ICL fails when: (1) Task complexity exceeds model capacity (deep multi-step reasoning). (2) Task distribution is far from pretraining distribution (highly domain-specific technical tasks). (3) Provided examples are inconsistent or misleading (model averages across contradictory patterns). Warning signs: high variance across different example orderings, performance worse than random for classification, model ignoring example labels and using prior beliefs instead.)
  5. ICL example order matters — why? (Answer: Recency bias: LLMs give higher weight to examples that appear later in the context (closer to the query). Also primacy effects: the first few examples establish the task format strongly. These biases mean that the same examples in different orders can produce 20–30% accuracy differences. Calibration strategies: use calibrated label priors, random ordering with ensembling, or place the most representative examples last.)

Try LumiChats for ₹69

39+ AI models. Study Mode with page-locked answers. Agent Mode with code execution. Pay only on days you use it.

Get Started — ₹69/day

Related Terms

5 terms