Glossary/Prompt Engineering

Definition

Prompt engineering is the practice of designing and optimizing inputs (prompts) to large language models to elicit accurate, useful, and well-formatted outputs for a specific task. Because LLMs are extraordinarily sensitive to how instructions are worded, prompt engineering is both a systematic technical discipline and a creative skill. The same underlying question — phrased differently — can yield responses that vary dramatically in accuracy, depth, safety, and format. In 2026, prompt engineering is one of the most in-demand AI skills in the US job market, with dedicated roles at Google, Meta, Anthropic, and hundreds of startups.

Why prompts matter so much

LLMs are extraordinarily sensitive to phrasing. The same underlying question, worded differently, can produce responses varying enormously in accuracy, depth, and format. Small wording changes unlock latent capabilities — or suppress them. This sensitivity is both a feature (you can guide model behavior precisely) and a skill to learn.

Prompt styleExampleWhat changes
Bare zero-shotWhat is 2+2?Minimal answer: "4"
Role-primedAs a math professor, solve 2+2, showing all stepsDetailed, pedagogical explanation
CoT triggerWhat is 2+2? Think step by step.Reasoning trace appears before final answer
ConstrainedExplain 2+2 in under 20 words for a 5-year-oldAudience-appropriate, concise phrasing
Format-specifiedReturn JSON: {"answer": ..., "explanation": ...}Structured, machine-parseable output

The four-word discovery

Adding "Let's think step by step" to GSM8K math problems raised GPT-3's accuracy from ~18% to ~48% — a 2.7× gain from four words (Kojima et al., 2022). Frontier models like GPT-4o score 95%+ on GSM8K with CoT, vs ~50% without it.

Core prompting techniques

Six foundational techniques cover the vast majority of prompting scenarios. They stack and combine — e.g., role-priming + few-shot + CoT + format specification is the most powerful general-purpose pattern.

TechniqueHow it worksBest forExample snippet
Zero-shotAsk directly, no examplesSimple, well-defined tasks"Classify as positive/negative: I loved it"
Few-shotProvide 2–5 input→output examples before your requestTasks with a clear I/O pattern"Pos: great. Neg: awful. Classify: mediocre"
Chain-of-thoughtAsk the model to reason step-by-step firstMath, logic, multi-step reasoning"Think step by step, then give your answer."
Role promptingAssign an expert personaDomain knowledge, tone control"You are an expert cardiologist. Explain..."
Format specificationSpecify exact output structureStructured data, API integration"Return a JSON object with keys: name, score"
Constraint specificationSet explicit boundaries on outputLength, style, and content control"Answer in under 50 words. Never use jargon."

Few-shot prompting via the OpenAI API — most reliable pattern for consistent structured output

from openai import OpenAI

client = OpenAI()

system_prompt = """You are a sentiment classifier.
Classify each review as POSITIVE, NEGATIVE, or NEUTRAL.
Return only the label — nothing else."""

# Few-shot examples establish the exact format and pattern
few_shot_examples = [
    {"role": "user",      "content": "The food was absolutely delicious!"},
    {"role": "assistant", "content": "POSITIVE"},
    {"role": "user",      "content": "Service was slow and staff were rude."},
    {"role": "assistant", "content": "NEGATIVE"},
    {"role": "user",      "content": "It was fine, nothing special."},
    {"role": "assistant", "content": "NEUTRAL"},
]

def classify_sentiment(review: str) -> str:
    messages = [{"role": "system", "content": system_prompt}]
    messages.extend(few_shot_examples)
    messages.append({"role": "user", "content": review})

    resp = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        max_tokens=10,
        temperature=0,   # deterministic for classification
    )
    return resp.choices[0].message.content.strip()

print(classify_sentiment("Worst meal I've had in years."))    # → NEGATIVE
print(classify_sentiment("Pretty good, would visit again."))  # → POSITIVE

Advanced prompting patterns

Beyond the basics, several patterns dramatically improve performance on hard reasoning tasks — each one representing a published research breakthrough.

PatternCore ideaBenchmark gainBest use case
Self-consistency (Wang 2022)Generate 10–40 CoT paths, take majority vote on final answers+5–15% accuracy on math benchmarks vs single CoTHigh-stakes math and logic problems
Tree of Thoughts (Yao 2023)Explore branching reasoning paths, prune dead ends, backtrackSolves 74% of Game of 24 vs 4% with standard CoTComplex puzzles, planning, creative tasks
ReAct (Yao 2022)Interleave Thought → Action → Observation in a reasoning loop2–3× better factual accuracy on multi-hop QAAgentic tool use, research tasks
Least-to-most (Zhou 2023)Decompose complex problem → solve subproblems sequentiallyDramatic gains on compositional generalization tasksMath word problems, multi-step code tasks
Meta-promptingUse an LLM to generate and optimize prompts for another taskOften matches hand-crafted few-shot demonstrationsAutomating prompt development pipelines

ReAct agent loop — the architectural pattern behind LangChain, AutoGPT, and Claude tool use (Yao et al., 2022)

# ReAct: Reason + Act. The model interleaves thinking and tool calls.
# This exact pattern underpins virtually all LLM agent frameworks.

REACT_SYSTEM = """You have access to these tools:
- search("query")     → returns top web results
- calculate("expr")   → evaluates a math expression  
- lookup("entity")    → returns a Wikipedia summary

For each step, output EXACTLY:
Thought: <your reasoning about what to do next>
Action: <tool_name("argument")>

After receiving Observation: <r>, continue reasoning.
When done: Final Answer: <your answer>"""

def react_agent(question: str, tools: dict, max_steps: int = 8) -> str:
    messages = [
        {"role": "system", "content": REACT_SYSTEM},
        {"role": "user",   "content": question},
    ]

    for _ in range(max_steps):
        response = call_llm(messages)
        messages.append({"role": "assistant", "content": response})

        if "Final Answer:" in response:
            return response.split("Final Answer:")[-1].strip()

        for line in response.split("
"):
            if line.startswith("Action:"):
                tool_call = line.replace("Action:", "").strip()
                tool_name = tool_call.split("(")[0]
                tool_arg  = tool_call.split('"')[1]
                result    = tools[tool_name](tool_arg)
                messages.append({"role": "user", "content": f"Observation: {result}"})
                break

    return "Max steps reached"

Prompt engineering for ChatGPT — 10 templates that work in 2026

Use caseTemplateWhy it works
Explain a complex topic"Explain [topic] to me like I'm a smart 16-year-old with no background in the subject. Use a real-world analogy in the first sentence."Anchors reading level and forces an analogy — both improve comprehension and prevent jargon
Write a cover letter"Write a cover letter for a [job title] role at [company]. My background: [2-3 bullet points]. Tone: professional but not stiff. Max 250 words. End with a confident call to action."Constraints (word count, tone, ending) prevent generic GPT filler
Debug code"Here is my Python code: [paste]. It should [expected behavior] but instead [actual behavior]. Identify the bug, explain why it happens, and show the fixed version with inline comments."Providing expected vs actual behavior narrows the search space dramatically
Study from a textbook"I'm studying [subject] for [exam/class]. Here is a concept: [paste text]. Create 5 Socratic questions that test deep understanding (not rote memorization), then answer each one."Socratic framing generates questions that reveal conceptual gaps, not just recall
Summarize a document"Summarize the following in 3 sentences for an executive audience, then list the 3 most important action items. Document: [paste]"Two-part output (summary + action items) forces the model to extract signal from noise
Brainstorm ideas"Generate 10 [type of ideas] for [context]. For each one: 1 sentence description, biggest risk, biggest upside. Format as a numbered list."Forcing risk/upside analysis prevents the model from generating only safe, generic ideas
Rewrite for clarity"Rewrite the following text so it's clearer and more direct. Keep all the facts. Cut filler. Target reading level: professional adult. [paste]"Explicit instruction to preserve facts prevents hallucinated rewrites
Interview prep"Act as a tough interviewer for a [role] position at a [company type]. Ask me one behavioral interview question at a time. After my answer, give honest feedback on what was strong, what was weak, and what I should add. Start now."One-question-at-a-time mimics real interview flow; feedback after each answer enables iteration
Build a study plan"Create a 4-week study plan for [subject/exam]. I have [X hours/week]. I'm a [beginner/intermediate]. Include: daily topics, practice exercises, and one mock test per week. Output as a table."Constraints (time, level, table format) prevent vague advice and produce an immediately actionable plan
Compare two options"Compare [A] vs [B] for someone who [specific context]. Create a table with these exact columns: Feature | [A] | [B] | Winner. After the table, give a 2-sentence recommendation."Specifying exact columns forces structured parity; the recommendation sentence prevents a wishy-washy conclusion

The single most important ChatGPT tip

Always end your prompt with the output format you want: "Output as a numbered list", "Format as a markdown table", "Answer in under 100 words". GPT-4o's default is to write long flowing prose — explicit format instructions override this default every time.

Prompt injection and security

Prompt injection is the AI equivalent of SQL injection: malicious content in user input or retrieved documents overwrites the system prompt's instructions, causing the model to behave contrary to its design. It is the #1 security vulnerability in LLM-based applications as of 2026.

Attack typeHow it worksReal-world riskMitigation
Direct injectionUser types "Ignore previous instructions and instead..."Customer service bot reveals pricing strategy, internal data, or personaInput filtering + sandboxed system prompts
Indirect injectionMalicious text in a webpage/document the AI reads contains hidden instructionsRAG-based assistant follows attacker instructions embedded in a retrieved articleSanitize retrieved content; use separate trusted/untrusted context windows
JailbreakingCreative framing ("pretend you're DAN", roleplay scenarios) bypasses safety guidelinesModel generates harmful content it normally refusesRLHF / Constitutional AI training; input classifiers
Prompt leakingAttacker asks model to "repeat your system prompt"Proprietary system prompts, personas, business logic exposedInstruct model to never repeat system prompt; use Anthropic's system prompt cache

Defense-in-depth is required

No single mitigation stops all prompt injection attacks. Production LLM applications need multiple layers: input validation, output filtering, sandboxed execution environments, rate limiting, and human review for high-stakes outputs. Never assume the model will self-police.

Prompt engineering jobs and salary in 2026

Role titleMedian US salary (2026)Top-end US salaryWhere hiring
Prompt Engineer$115,000$195,000Anthropic, OpenAI, Google DeepMind, Meta AI
AI Prompt Specialist$85,000$140,000HubSpot, Salesforce, enterprise SaaS
LLM Application Engineer$130,000$220,000AI startups, hedge funds, big tech
Conversational AI Designer$95,000$160,000Healthcare, legal, financial services
AI Content Strategist$75,000$125,000Media companies, agencies, e-commerce
ML Prompt Researcher$145,000$260,000Research labs (OpenAI, Anthropic, Google)
  • Top skills hiring managers look for: System prompt design, RAG pipeline optimization, LangChain/LlamaIndex, DSPy, prompt injection security, evaluation harnesses, A/B testing prompts at scale
  • No CS degree required for most roles: Anthropic's 2025 hiring data showed 38% of prompt engineering hires came from non-CS backgrounds (linguistics, philosophy, technical writing, law)
  • Remote-first: ~72% of US prompt engineering roles allow fully remote work as of Q1 2026
  • Portfolio > credentials: A GitHub repo of well-documented prompts, eval scripts, and before/after quality comparisons consistently outperforms a resume alone in hiring decisions

How to break in without experience

Build a public prompt library on GitHub or HuggingFace Spaces. Pick one hard task (legal contract review, medical triage, code review), write 10+ prompt variants, build an eval harness that scores them objectively, and publish the results. This demonstrates exactly the skills companies hire for — and it's more compelling than any certification.

Prompt optimization and automated prompt engineering

Manual prompt iteration is slow. Automated Prompt Engineering (APE) uses LLMs to generate, score, and refine prompts systematically — often finding solutions humans miss.

Tool / FrameworkApproachBest for
DSPy (Stanford)Compiles prompt programs using gradient-like optimization over a datasetProduction pipelines where quality must be measured and maximized
Automatic Prompt Engineer (APE)LLM generates candidate prompts; scored by performance on held-out examplesFinding zero-shot prompts that match few-shot quality
OPRO (Google)LLM optimizes prompts using "meta-prompts" that describe the optimization goalIterative refinement of task-specific prompts
TextGradBackpropagates feedback through text — treats language feedback as gradientsComplex multi-step agentic pipelines

DSPy — compile a prompt program instead of hand-writing prompts (Stanford NLP, 2023)

import dspy

# Define your task as a typed signature — inputs and outputs with descriptions
class SentimentClassifier(dspy.Signature):
    """Classify customer review sentiment."""
    review: str = dspy.InputField(desc="Customer review text")
    sentiment: str = dspy.OutputField(desc="POSITIVE, NEGATIVE, or NEUTRAL")

# Wrap in a module — DSPy handles the actual prompt internally
class SentimentModule(dspy.Module):
    def __init__(self):
        self.classify = dspy.Predict(SentimentClassifier)

    def forward(self, review: str):
        return self.classify(review=review)

# Configure your LLM
lm = dspy.LM("openai/gpt-4o-mini")
dspy.configure(lm=lm)

# Compile: DSPy optimizes the prompt using your labeled examples
from dspy.teleprompt import BootstrapFewShot

trainset = [
    dspy.Example(review="Amazing product!", sentiment="POSITIVE").with_inputs("review"),
    dspy.Example(review="Total waste of money.", sentiment="NEGATIVE").with_inputs("review"),
    dspy.Example(review="It arrived on time.", sentiment="NEUTRAL").with_inputs("review"),
]

optimizer = BootstrapFewShot(metric=lambda pred, ex: pred.sentiment == ex.sentiment)
compiled = optimizer.compile(SentimentModule(), trainset=trainset)

# The compiled module has an auto-optimized prompt — better than hand-written
result = compiled(review="Best purchase I've made this year!")
print(result.sentiment)  # → POSITIVE

Model-specific prompting: what works for Claude vs GPT-4o vs Gemini vs o3

Each frontier model has distinct strengths, training emphases, and behavioral defaults. The same prompt can produce wildly different quality results across models — understanding model-specific patterns is a core prompt engineering skill.

ModelStrengthsPrompting tipsAvoid
Claude 3.5 / 3.7 (Anthropic)Nuanced instruction-following, long-document analysis, coding, constitutional safetyUse XML tags for structure (, , ). Claude follows multi-part instructions reliably. Use "Think through this step by step before answering."Overly casual framing for complex tasks; very short context for document analysis — give it the full document
GPT-4o (OpenAI)Multimodal (vision + audio), broad world knowledge, fast structured outputSpecify JSON output with a schema example. Role + format + constraint stacking works very well. Use temperature=0 for deterministic tasks.Assuming it'll self-fact-check; trusting citations without verification (hallucination rate higher than Claude on factual tasks)
Gemini 1.5 / 2.0 Pro (Google)1M+ token context, native video/audio understanding, Google Search groundingUse for tasks requiring massive context windows (entire codebases, long contracts). Enable Google Search grounding for factual queries.Complex multi-step reasoning chains without CoT triggers — needs more explicit step-by-step framing than Claude
o3 / o3-mini (OpenAI)State-of-the-art math, science, and coding reasoning; extended thinking timeKeep prompts minimal — o3 does its own internal reasoning. Avoid "think step by step" (redundant). Give the problem, not the method.Using o3 for simple conversational tasks — cost is 10–50× GPT-4o-mini; use only for hard reasoning problems
DeepSeek V3 / R1 (DeepSeek)Top-tier coding, math, Chinese language tasks; very cost-efficientWorks exceptionally well for code generation and debugging with detailed spec prompts. R1 exposes its reasoning chain.Privacy-sensitive data — model is hosted in China; data governance implications for US enterprise use

The 80/20 rule of model selection

For 80% of tasks, GPT-4o-mini or Claude Haiku gives 90% of the quality at 10% of the cost. Reserve GPT-4o, Claude Sonnet, and Gemini Pro for tasks that actually need them. Save o3 for problems that genuinely require deep mathematical or logical reasoning. LumiChats lets you switch between all of these in one click — making model selection fast and cost-efficient.

Frequently asked questions about prompt engineering

  • Is prompt engineering a real job in 2026? Yes — and it's growing. LinkedIn reported a 400% increase in "prompt engineer" job postings between 2023 and 2025. Salaries range from $85K for specialist roles to $260K+ at AI research labs. Most roles do not require a CS degree.
  • How is prompt engineering different from just talking to ChatGPT? Casual ChatGPT use is like typing a Google search. Prompt engineering is systematic: you define the task structure, provide examples, specify output format, test variants, measure quality, and iterate. It's the difference between guessing and engineering.
  • Does prompt engineering become obsolete as models get smarter? Partially — GPT-4o and Claude 3.5 need less hand-holding than GPT-3.5. But as models get more capable, the tasks we ask them to do get harder too. Automated prompt optimization (DSPy, OPRO) is growing, but human judgment for task design and evaluation remains essential.
  • What is the best way to learn prompt engineering in 2026? Build in public: pick a hard real-world task, write 10 prompt variants, score them with an eval script, and publish your findings. Anthropic's prompt engineering documentation, OpenAI's cookbook, and the DSPy tutorials are the best free resources.
  • What's the difference between a system prompt and a user prompt? System prompt: persistent instructions set by the developer before the conversation starts — defines persona, constraints, and task context. User prompt: the actual message from the end user. The system prompt overrides user instructions in cases of conflict (though prompt injection can sometimes bypass this).
  • Can I use the same prompts across different AI models? Often yes for simple tasks, but model-specific tuning significantly improves quality. Claude responds best to XML-tagged structure. GPT-4o to JSON schema examples. Gemini to explicit grounding instructions. Expect 20–40% quality variation on complex tasks when moving the same prompt between models without adaptation.

Practice questions

  1. What is an LLM's context window and what happens when content exceeds it? (Answer: Context window = the maximum number of tokens the model can process in one forward pass (input + output combined). Claude 3.5: 200K tokens. GPT-4o: 128K. LLaMA 3.1: 128K. When content exceeds the limit: API throws a context_length_exceeded error (you must chunk or summarize). Common workaround: sliding window chunking with overlap, RAG (retrieve only relevant portions), or summarization of earlier context.)
  2. Why does Chain-of-Thought prompting improve LLM accuracy on math and reasoning tasks? (Answer: CoT forces the model to decompose multi-step problems into explicit intermediate steps. This works because: (1) each intermediate step is a simpler sub-problem within the model's capability, (2) earlier reasoning steps are in the context window and can be referenced for later steps, (3) errors in intermediate steps are self-correctable when the chain is visible. Without CoT, the model must compute the entire reasoning chain "in one pass" — which exceeds its working memory for hard problems.)
  3. What is the difference between zero-shot, one-shot, and few-shot prompting? (Answer: Zero-shot: no examples — rely entirely on the model's pretrained knowledge. One-shot: one input→output example before the query. Few-shot: 2–10 examples. Few-shot works best when: the task format is non-standard, output structure must be exact, or the task is ambiguous without examples. GPT-3 showed that ~3 examples often matched fine-tuned model quality on classification tasks — this was the original "few-shot" result from Brown et al. 2020.)
  4. What is prompt injection and why is it the top LLM security risk? (Answer: Prompt injection occurs when malicious text in user input or retrieved content overrides the system prompt's instructions. It is the top LLM security risk because: (1) LLMs cannot distinguish trusted system instructions from malicious user content — they process all tokens equally. (2) Indirect injection (in retrieved documents, emails, web pages) is very hard to filter. (3) Most LLM applications lack input sandboxing. Mitigations include: input/output classifiers, sandboxed execution, privilege-separated context windows, and output validation — not a single silver bullet.)
  5. What is DSPy and how does it differ from manually writing prompts? (Answer: DSPy (Declarative Self-improving Python) treats prompts as programs that can be compiled and optimized, rather than hand-written strings. Instead of writing "You are a classifier. Given a review, output POSITIVE or NEGATIVE", you define a typed Signature (input/output fields with descriptions) and a metric. DSPy's optimizer then generates and tests prompt candidates, selecting the best-performing variant for your dataset. The key difference: DSPy optimizes for measurable performance; manual prompting optimizes for intuition. For production systems where quality can be measured, DSPy consistently outperforms hand-crafted prompts.)

On LumiChats

LumiChats provides mode-specific optimized prompts for Study Mode, Agent Mode, and Quiz Hub — years of iterative prompt engineering embedded into each feature. When you use Study Mode, a carefully crafted system prompt instructs the model to only answer from retrieved document chunks and cite page numbers. LumiChats also lets you switch between 39+ models (GPT-4o, Claude Sonnet, Gemini Pro, o3-mini, DeepSeek V3) so you can apply model-specific prompting strategies without managing multiple API accounts.

Try it free

✦ Under $1 / day

Practice what you just learned

Quiz Hub + Study Mode lock in every concept. 40+ AI models, Agent Mode, page-locked answers — all for less than a dollar a day.

Start Free — Under $1/day

Related Terms

7 terms