What is practice questions?

In-Context Learning (ICL): Practice questions. Learn more in the LumiChats AI Glossary at https://lumichats.com/glossary/in-context-learning

In-Context Learning (ICL)

In-context learning (ICL) is the ability of large language models to perform new tasks by observing examples provided within the prompt, without updating any model weights. By showing a model several input-output pairs as part of the prompt (few-shot examples), it can infer the pattern and apply it to a new input — effectively 'learning' the task from the context alone. ICL emerged as an unexpected capability of large-scale pretraining and is one of the key properties distinguishing large models from smaller ones.

Teaching a model to do new tasks by showing examples inside the prompt — no training required.

Category: AI Fundamentals

How ICL works — and why it's surprising

Standard machine learning requires updating model weights through gradient descent to teach a model a new task. ICL requires no weight updates — the model reads examples at inference time and adapts its output distribution accordingly. This is counterintuitive: the weights are fixed, yet the model's behavior changes based on what it reads in the prompt. The mechanism appears to be that large models, during pretraining, learned to recognize task structures and extrapolate patterns — so ICL is more like pattern completion than learning in the traditional sense.

from anthropic import Anthropic
client = Anthropic()

# In-context learning: show examples → model infers the pattern
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=200,
    messages=[{
        "role": "user",
        "content": """Translate English questions to SQL queries.

Example 1:
English: How many users signed up last month?
SQL: SELECT COUNT(*) FROM users WHERE created_at >= DATE_TRUNC('month', NOW() - INTERVAL '1 month')

Example 2:
English: What are the top 5 products by revenue?
SQL: SELECT product_name, SUM(price * quantity) as revenue FROM orders GROUP BY product_name ORDER BY revenue DESC LIMIT 5

Example 3:
English: Which users have never placed an order?
SQL: SELECT u.* FROM users u LEFT JOIN orders o ON u.id = o.user_id WHERE o.id IS NULL

Now translate:
English: What is the average order value for customers from India?
SQL:"""
    }]
)
# Without any fine-tuning, Claude infers the SQL translation pattern from 3 examples
print(response.content[0].text)
# → SELECT AVG(total_amount) FROM orders o JOIN users u ON o.user_id = u.id WHERE u.country = 'India'

Prompting mode	Examples provided	When to use
Zero-shot	0 examples — direct instruction only	Well-defined tasks; common task types the model was pretrained on
One-shot	1 example	Clarifying format when zero-shot produces wrong structure
Few-shot (2–5)	2–5 examples	Most ICL applications — best balance of context use vs. guidance
Many-shot (10+)	10–100+ examples	Complex structured tasks; domain-specific patterns; when few-shot is inconsistent

Scaling laws for ICL — why big models are better at it

ICL capability scales sharply with model size. GPT-2 (1.5B parameters) shows minimal ICL ability — adding examples barely improves performance. GPT-3 (175B) showed surprising few-shot performance that sparked the ICL research field. Models above ~7B parameters generally show reliable ICL; models above ~70B show strong generalization to novel task structures from minimal examples. This threshold behavior — where ICL appears almost discontinuously at scale — is one of the canonical examples of emergent capabilities.

ICL vs fine-tuning: when to use each: ICL (via prompting) is faster, cheaper, and requires no training infrastructure. Fine-tuning updates model weights and produces more reliable, consistent behavior for high-volume production tasks. The practical rule: start with ICL to validate that the task is learnable. If ICL achieves sufficient quality and volume doesn't justify fine-tuning costs, ship with ICL. If you need consistent quality across millions of daily requests or the task has high stakes (medical, legal, financial), fine-tune.

Practice questions

Why do large models (≥7B) exhibit in-context learning while small models (≤1B) largely do not? (Answer: ICL requires the model to perform Bayesian inference within the forward pass — inferring the task distribution from examples and updating its implicit prior. This requires sufficient model capacity to form and update task representations in context. Small models lack the representational capacity to maintain and manipulate multiple task hypotheses across long contexts. The capability emerges around 7B parameters and improves significantly at 65B+.)
What is the optimal number of in-context examples (k) for few-shot prompting? (Answer: Empirically: more examples generally helps up to a point. 8–32 examples is often optimal for most tasks. Too few (1–2): insufficient signal, high variance. Too many: exceeds context window, but also shows diminishing returns — the model's capability is bounded by what was learned during pretraining. For regression-style tasks or tasks with many classes, more examples help more. For simple binary tasks, 4–8 examples usually suffice.)
What does it mean that ICL is a 'meta-learning' phenomenon? (Answer: ICL is learning to learn. During pretraining, the model encountered many tasks in context (documents that implicitly demonstrate tasks). It learned the meta-skill of recognizing task structure from examples and applying it to new inputs. At inference, it 'meta-learns' the specific task from the prompt examples without gradient updates. Xie et al. (2022) showed ICL is equivalent to implicit Bayesian inference over a prior of latent concepts learned during pretraining.)
When does ICL fail and what are the warning signs? (Answer: ICL fails when: (1) Task complexity exceeds model capacity (deep multi-step reasoning). (2) Task distribution is far from pretraining distribution (highly domain-specific technical tasks). (3) Provided examples are inconsistent or misleading (model averages across contradictory patterns). Warning signs: high variance across different example orderings, performance worse than random for classification, model ignoring example labels and using prior beliefs instead.)
ICL example order matters — why? (Answer: Recency bias: LLMs give higher weight to examples that appear later in the context (closer to the query). Also primacy effects: the first few examples establish the task format strongly. These biases mean that the same examples in different orders can produce 20–30% accuracy differences. Calibration strategies: use calibrated label priors, random ordering with ensembling, or place the most representative examples last.)

from anthropic import Anthropic client = Anthropic() # In-context learning: show examples → model infers the pattern response = client.messages.create( model="claude-sonnet-4-6", max_tokens=200, messages=[{ "role": "user", "content": """Translate English questions to SQL queries. Example 1: English: How many users signed up last month? SQL: SELECT COUNT(*) FROM users WHERE created_at >= DATE_TRUNC('month', NOW() - INTERVAL '1 month') Example 2: English: What are the top 5 products by revenue? SQL: SELECT product_name, SUM(price * quantity) as revenue FROM orders GROUP BY product_name ORDER BY revenue DESC LIMIT 5 Example 3: English: Which users have never placed an order? SQL: SELECT u.* FROM users u LEFT JOIN orders o ON u.id = o.user_id WHERE o.id IS NULL Now translate: English: What is the average order value for customers from India? SQL:""" }] ) # Without any fine-tuning, Claude infers the SQL translation pattern from 3 examples print(response.content[0].text) # → SELECT AVG(total_amount) FROM orders o JOIN users u ON o.user_id = u.id WHERE u.country = 'India'

Prompting mode

Examples provided

When to use

Zero-shot

0 examples — direct instruction only

Well-defined tasks; common task types the model was pretrained on

One-shot

1 example

Clarifying format when zero-shot produces wrong structure

Few-shot (2–5)

2–5 examples

Most ICL applications — best balance of context use vs. guidance

Many-shot (10+)

10–100+ examples

Complex structured tasks; domain-specific patterns; when few-shot is inconsistent

In-Context Learning (ICL)

How ICL works — and why it's surprising

Scaling laws for ICL — why big models are better at it

Practice questions

In-Context Learning (ICL)

How ICL works — and why it's surprising

Scaling laws for ICL — why big models are better at it

Practice questions

Practice what you just learned

Related Terms