Zero-shot prompting is instructing a language model to perform a task without providing any examples of how to do it — relying entirely on the model's pretrained knowledge to understand and execute the request. The term 'zero-shot' comes from the zero-shot learning paradigm in machine learning, where a model generalises to unseen classes without task-specific training examples. In practice, large language models handle a remarkable range of zero-shot tasks, though few-shot prompting often improves reliability for tasks requiring specific output formats.
Zero-shot vs few-shot: when each wins
| Scenario | Zero-shot appropriate? | Why |
|---|---|---|
| Summarise a document | Yes — reliably zero-shot | Summarisation is a core pretraining objective; models excel at it |
| Translate English → French | Yes — reliably zero-shot | Multilingual pretraining; models are effectively bilingual |
| Classify sentiment (positive/negative/neutral) | Yes — reliable zero-shot | Sentiment is well represented in pretraining data |
| Convert text to a specific custom JSON schema | No — use few-shot | Custom schema structure needs examples to specify the exact format |
| Classify into company-specific categories | No — use few-shot | Model cannot know your internal taxonomy without seeing it |
| Write in a very specific proprietary style | No — use few-shot | Style requires examples; descriptions alone are imprecise |
Zero-shot vs few-shot prompting: when format matters, show examples
from anthropic import Anthropic
client = Anthropic()
# ── Zero-shot: works great for well-defined tasks ──────────────────────────
zero_shot_response = client.messages.create(
model="claude-sonnet-4-6", max_tokens=100,
messages=[{"role": "user", "content":
"Summarise this in one sentence: LumiChats is a pay-per-day AI platform that charges ₹69 only on days students actively use the service, giving access to 40+ models including Claude, GPT-5.4, and Gemini."
}]
)
# → Works well — summarisation is a core language task
# ── Few-shot: necessary when output format is non-standard ─────────────────
few_shot_response = client.messages.create(
model="claude-sonnet-4-6", max_tokens=100,
messages=[{"role": "user", "content": """
Extract entities in this format: TYPE: entity
Examples:
Text: "Apple released the iPhone 16 in September 2024"
COMPANY: Apple | PRODUCT: iPhone 16 | DATE: September 2024
Text: "Anthropic launched Claude Sonnet 4.6 in 2026"
COMPANY: Anthropic | PRODUCT: Claude Sonnet 4.6 | DATE: 2026
Now extract from:
Text: "LumiChats was founded in India by Aditya Kumar Jha"
"""}]
)
# → COMPANY: LumiChats | COUNTRY: India | PERSON: Aditya Kumar JhaZero-shot chain-of-thought — the most powerful single prompt technique
Zero-shot chain-of-thought (CoT) combines zero-shot prompting with the CoT technique by appending 'Let's think step by step' or 'Think carefully before answering' to the prompt without providing any worked examples. Kojima et al. (2022) showed this four-word addition raised GPT-3's accuracy on arithmetic from ~18% to ~48%. In 2026, frontier models (Claude Sonnet 4.6, GPT-5.4) benefit less from this explicit trigger — they tend to reason by default — but the technique remains essential for smaller models and for tasks requiring careful deliberation.
When to add "think step by step"
Add zero-shot CoT triggers to prompts involving: multi-step arithmetic, logical deductions with multiple conditions, problems where the answer requires eliminating wrong options, and any task where you notice the model giving confident but incorrect quick answers. For creative tasks, summarisation, and simple factual recall, CoT triggers add latency without quality benefit.
Practice questions
- You need a model to classify customer emails into 10 categories. You have no labelled examples. Should you use zero-shot or fine-tuning? (Answer: Start with zero-shot prompting using a clear system prompt listing the 10 categories with brief descriptions. Measure accuracy on a sample. If accuracy exceeds 85%, deploy. If lower, collect 50–100 labelled examples for few-shot prompting or LoRA fine-tuning. Zero-shot with GPT-4 or Claude often achieves 80–90% on classification tasks, making fine-tuning unnecessary.)
- What is the zero-shot chain-of-thought technique and why does 'Let's think step by step' work? (Answer: Appending 'Let's think step by step' to a zero-shot prompt triggers the model to generate intermediate reasoning before the final answer. This works because the model has seen this phrase in countless reasoning contexts during pretraining — it's a learned cue that activates the reasoning circuits. Improves accuracy by 20–40% on multi-step problems by letting the model decompose complexity rather than answering in one shot.)
- Zero-shot prompting with GPT-4 achieves 82% on a task. A fine-tuned 7B model achieves 91%. Which should you use? (Answer: It depends on cost, latency, and maintenance. GPT-4 zero-shot: higher cost per query, no training effort, easy to update prompts, latency from API calls. Fine-tuned 7B: lower inference cost (self-hosted), requires training infrastructure, harder to update as task evolves. For high-volume production: 7B fine-tuned likely preferred. For low-volume or rapidly evolving tasks: GPT-4 zero-shot.)
- What is prompt sensitivity in zero-shot prompting and why is it a concern? (Answer: Prompt sensitivity: small changes in phrasing cause large changes in model output. 'Classify this as positive/negative' may give 85% accuracy; 'Determine the sentiment of this text' may give 78%. This makes zero-shot systems brittle — deployment may degrade if prompts are modified without testing. Mitigation: test multiple prompt variants, use prompt ensembling (majority vote across N phrasings), or fine-tune for consistent behaviour.)
- What is the difference between zero-shot and zero-shot-CoT prompting on a maths problem? (Answer: Zero-shot: ask the question directly ('What is 15% of 240?') → model answers immediately, often incorrectly for multi-step problems. Zero-shot-CoT: ask ('What is 15% of 240? Let's think step by step.') → model generates '10% of 240 = 24. 5% = 12. 15% = 24 + 12 = 36.' then answers 36. The reasoning trace lets the model use scratchpad computation, dramatically improving accuracy on problems requiring multiple steps.)