Prompt Engineering

Prompt engineering is the practice of designing and optimizing inputs (prompts) to large language models to elicit accurate, useful, and well-formatted outputs for a specific task. Because LLMs are extraordinarily sensitive to how instructions are worded, prompt engineering is both a systematic technical discipline and a creative skill. The same underlying question — phrased differently — can yield responses that vary dramatically in accuracy, depth, safety, and format. In 2026, prompt engineering is one of the most in-demand AI skills in the US job market, with dedicated roles at Google, Meta, Anthropic, and hundreds of startups.

The #1 AI skill in 2026 — how you talk to AI determines what it does.

Category: Model Training & Optimization

Why prompts matter so much

LLMs are extraordinarily sensitive to phrasing. The same underlying question, worded differently, can produce responses varying enormously in accuracy, depth, and format. Small wording changes unlock latent capabilities — or suppress them. This sensitivity is both a feature (you can guide model behavior precisely) and a skill to learn.

Prompt style	Example	What changes
Bare zero-shot	What is 2+2?	Minimal answer: "4"
Role-primed	As a math professor, solve 2+2, showing all steps	Detailed, pedagogical explanation
CoT trigger	What is 2+2? Think step by step.	Reasoning trace appears before final answer
Constrained	Explain 2+2 in under 20 words for a 5-year-old	Audience-appropriate, concise phrasing
Format-specified	Return JSON: {"answer": ..., "explanation": ...}	Structured, machine-parseable output

The four-word discovery: Adding "Let's think step by step" to GSM8K math problems raised GPT-3's accuracy from ~18% to ~48% — a 2.7× gain from four words (Kojima et al., 2022). Frontier models like GPT-4o score 95%+ on GSM8K with CoT, vs ~50% without it.

Core prompting techniques

Six foundational techniques cover the vast majority of prompting scenarios. They stack and combine — e.g., role-priming + few-shot + CoT + format specification is the most powerful general-purpose pattern.

Technique	How it works	Best for	Example snippet
Zero-shot	Ask directly, no examples	Simple, well-defined tasks	"Classify as positive/negative: I loved it"
Few-shot	Provide 2–5 input→output examples before your request	Tasks with a clear I/O pattern	"Pos: great. Neg: awful. Classify: mediocre"
Chain-of-thought	Ask the model to reason step-by-step first	Math, logic, multi-step reasoning	"Think step by step, then give your answer."
Role prompting	Assign an expert persona	Domain knowledge, tone control	"You are an expert cardiologist. Explain..."
Format specification	Specify exact output structure	Structured data, API integration	"Return a JSON object with keys: name, score"
Constraint specification	Set explicit boundaries on output	Length, style, and content control	"Answer in under 50 words. Never use jargon."

from openai import OpenAI

client = OpenAI()

system_prompt = """You are a sentiment classifier.
Classify each review as POSITIVE, NEGATIVE, or NEUTRAL.
Return only the label — nothing else."""

# Few-shot examples establish the exact format and pattern
few_shot_examples = [
    {"role": "user",      "content": "The food was absolutely delicious!"},
    {"role": "assistant", "content": "POSITIVE"},
    {"role": "user",      "content": "Service was slow and staff were rude."},
    {"role": "assistant", "content": "NEGATIVE"},
    {"role": "user",      "content": "It was fine, nothing special."},
    {"role": "assistant", "content": "NEUTRAL"},
]

def classify_sentiment(review: str) -> str:
    messages = [{"role": "system", "content": system_prompt}]
    messages.extend(few_shot_examples)
    messages.append({"role": "user", "content": review})

    resp = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        max_tokens=10,
        temperature=0,   # deterministic for classification
    )
    return resp.choices[0].message.content.strip()

print(classify_sentiment("Worst meal I've had in years."))    # → NEGATIVE
print(classify_sentiment("Pretty good, would visit again."))  # → POSITIVE

Advanced prompting patterns

Beyond the basics, several patterns dramatically improve performance on hard reasoning tasks — each one representing a published research breakthrough.

Pattern	Core idea	Benchmark gain	Best use case
Self-consistency (Wang 2022)	Generate 10–40 CoT paths, take majority vote on final answers	+5–15% accuracy on math benchmarks vs single CoT	High-stakes math and logic problems
Tree of Thoughts (Yao 2023)	Explore branching reasoning paths, prune dead ends, backtrack	Solves 74% of Game of 24 vs 4% with standard CoT	Complex puzzles, planning, creative tasks
ReAct (Yao 2022)	Interleave Thought → Action → Observation in a reasoning loop	2–3× better factual accuracy on multi-hop QA	Agentic tool use, research tasks
Least-to-most (Zhou 2023)	Decompose complex problem → solve subproblems sequentially	Dramatic gains on compositional generalization tasks	Math word problems, multi-step code tasks
Meta-prompting	Use an LLM to generate and optimize prompts for another task	Often matches hand-crafted few-shot demonstrations	Automating prompt development pipelines

# ReAct: Reason + Act. The model interleaves thinking and tool calls.
# This exact pattern underpins virtually all LLM agent frameworks.

REACT_SYSTEM = """You have access to these tools:
- search("query")     → returns top web results
- calculate("expr")   → evaluates a math expression  
- lookup("entity")    → returns a Wikipedia summary

For each step, output EXACTLY:
Thought: <your reasoning about what to do next>
Action: <tool_name("argument")>

After receiving Observation: <r>, continue reasoning.
When done: Final Answer: <your answer>"""

def react_agent(question: str, tools: dict, max_steps: int = 8) -> str:
    messages = [
        {"role": "system", "content": REACT_SYSTEM},
        {"role": "user",   "content": question},
    ]

    for _ in range(max_steps):
        response = call_llm(messages)
        messages.append({"role": "assistant", "content": response})

        if "Final Answer:" in response:
            return response.split("Final Answer:")[-1].strip()

        for line in response.split("
"):
            if line.startswith("Action:"):
                tool_call = line.replace("Action:", "").strip()
                tool_name = tool_call.split("(")[0]
                tool_arg  = tool_call.split('"')[1]
                result    = tools[tool_name](tool_arg)
                messages.append({"role": "user", "content": f"Observation: {result}"})
                break

    return "Max steps reached"

Prompt engineering for ChatGPT — 10 templates that work in 2026

ChatGPT (GPT-4o) responds best to prompts that are specific, role-defined, and explicitly state the desired format. These 10 templates are optimized for ChatGPT's instruction-following strengths and cover the most common US user tasks: writing, coding, research, studying, and career help.

Use case	Template	Why it works
Explain a complex topic	"Explain [topic] to me like I'm a smart 16-year-old with no background in the subject. Use a real-world analogy in the first sentence."	Anchors reading level and forces an analogy — both improve comprehension and prevent jargon
Write a cover letter	"Write a cover letter for a [job title] role at [company]. My background: [2-3 bullet points]. Tone: professional but not stiff. Max 250 words. End with a confident call to action."	Constraints (word count, tone, ending) prevent generic GPT filler
Debug code	"Here is my Python code: [paste]. It should [expected behavior] but instead [actual behavior]. Identify the bug, explain why it happens, and show the fixed version with inline comments."	Providing expected vs actual behavior narrows the search space dramatically
Study from a textbook	"I'm studying [subject] for [exam/class]. Here is a concept: [paste text]. Create 5 Socratic questions that test deep understanding (not rote memorization), then answer each one."	Socratic framing generates questions that reveal conceptual gaps, not just recall
Summarize a document	"Summarize the following in 3 sentences for an executive audience, then list the 3 most important action items. Document: [paste]"	Two-part output (summary + action items) forces the model to extract signal from noise
Brainstorm ideas	"Generate 10 [type of ideas] for [context]. For each one: 1 sentence description, biggest risk, biggest upside. Format as a numbered list."	Forcing risk/upside analysis prevents the model from generating only safe, generic ideas
Rewrite for clarity	"Rewrite the following text so it's clearer and more direct. Keep all the facts. Cut filler. Target reading level: professional adult. [paste]"	Explicit instruction to preserve facts prevents hallucinated rewrites
Interview prep	"Act as a tough interviewer for a [role] position at a [company type]. Ask me one behavioral interview question at a time. After my answer, give honest feedback on what was strong, what was weak, and what I should add. Start now."	One-question-at-a-time mimics real interview flow; feedback after each answer enables iteration
Build a study plan	"Create a 4-week study plan for [subject/exam]. I have [X hours/week]. I'm a [beginner/intermediate]. Include: daily topics, practice exercises, and one mock test per week. Output as a table."	Constraints (time, level, table format) prevent vague advice and produce an immediately actionable plan
Compare two options	"Compare [A] vs [B] for someone who [specific context]. Create a table with these exact columns: Feature \| [A] \| [B] \| Winner. After the table, give a 2-sentence recommendation."	Specifying exact columns forces structured parity; the recommendation sentence prevents a wishy-washy conclusion

The single most important ChatGPT tip: Always end your prompt with the output format you want: "Output as a numbered list", "Format as a markdown table", "Answer in under 100 words". GPT-4o's default is to write long flowing prose — explicit format instructions override this default every time.

Prompt injection and security

Prompt injection is the AI equivalent of SQL injection: malicious content in user input or retrieved documents overwrites the system prompt's instructions, causing the model to behave contrary to its design. It is the #1 security vulnerability in LLM-based applications as of 2026.

Attack type	How it works	Real-world risk	Mitigation
Direct injection	User types "Ignore previous instructions and instead..."	Customer service bot reveals pricing strategy, internal data, or persona	Input filtering + sandboxed system prompts
Indirect injection	Malicious text in a webpage/document the AI reads contains hidden instructions	RAG-based assistant follows attacker instructions embedded in a retrieved article	Sanitize retrieved content; use separate trusted/untrusted context windows
Jailbreaking	Creative framing ("pretend you're DAN", roleplay scenarios) bypasses safety guidelines	Model generates harmful content it normally refuses	RLHF / Constitutional AI training; input classifiers
Prompt leaking	Attacker asks model to "repeat your system prompt"	Proprietary system prompts, personas, business logic exposed	Instruct model to never repeat system prompt; use Anthropic's system prompt cache

Defense-in-depth is required: No single mitigation stops all prompt injection attacks. Production LLM applications need multiple layers: input validation, output filtering, sandboxed execution environments, rate limiting, and human review for high-stakes outputs. Never assume the model will self-police.

Prompt engineering jobs and salary in 2026

Prompt engineering is now a standalone job title at major US tech companies, AI startups, law firms, healthcare systems, and Fortune 500 enterprises. It is one of the few AI-adjacent roles that does not require a computer science degree — making it accessible to professionals from writing, education, law, medicine, and business backgrounds.

Role title	Median US salary (2026)	Top-end US salary	Where hiring
Prompt Engineer	$115,000	$195,000	Anthropic, OpenAI, Google DeepMind, Meta AI
AI Prompt Specialist	$85,000	$140,000	HubSpot, Salesforce, enterprise SaaS
LLM Application Engineer	$130,000	$220,000	AI startups, hedge funds, big tech
Conversational AI Designer	$95,000	$160,000	Healthcare, legal, financial services
AI Content Strategist	$75,000	$125,000	Media companies, agencies, e-commerce
ML Prompt Researcher	$145,000	$260,000	Research labs (OpenAI, Anthropic, Google)

Top skills hiring managers look for: System prompt design, RAG pipeline optimization, LangChain/LlamaIndex, DSPy, prompt injection security, evaluation harnesses, A/B testing prompts at scale
No CS degree required for most roles: Anthropic's 2025 hiring data showed 38% of prompt engineering hires came from non-CS backgrounds (linguistics, philosophy, technical writing, law)
Remote-first: ~72% of US prompt engineering roles allow fully remote work as of Q1 2026
Portfolio > credentials: A GitHub repo of well-documented prompts, eval scripts, and before/after quality comparisons consistently outperforms a resume alone in hiring decisions

How to break in without experience: Build a public prompt library on GitHub or HuggingFace Spaces. Pick one hard task (legal contract review, medical triage, code review), write 10+ prompt variants, build an eval harness that scores them objectively, and publish the results. This demonstrates exactly the skills companies hire for — and it's more compelling than any certification.

Prompt optimization and automated prompt engineering

Manual prompt iteration is slow. Automated Prompt Engineering (APE) uses LLMs to generate, score, and refine prompts systematically — often finding solutions humans miss.

Tool / Framework	Approach	Best for
DSPy (Stanford)	Compiles prompt programs using gradient-like optimization over a dataset	Production pipelines where quality must be measured and maximized
Automatic Prompt Engineer (APE)	LLM generates candidate prompts; scored by performance on held-out examples	Finding zero-shot prompts that match few-shot quality
OPRO (Google)	LLM optimizes prompts using "meta-prompts" that describe the optimization goal	Iterative refinement of task-specific prompts
TextGrad	Backpropagates feedback through text — treats language feedback as gradients	Complex multi-step agentic pipelines

import dspy

# Define your task as a typed signature — inputs and outputs with descriptions
class SentimentClassifier(dspy.Signature):
    """Classify customer review sentiment."""
    review: str = dspy.InputField(desc="Customer review text")
    sentiment: str = dspy.OutputField(desc="POSITIVE, NEGATIVE, or NEUTRAL")

# Wrap in a module — DSPy handles the actual prompt internally
class SentimentModule(dspy.Module):
    def __init__(self):
        self.classify = dspy.Predict(SentimentClassifier)

    def forward(self, review: str):
        return self.classify(review=review)

# Configure your LLM
lm = dspy.LM("openai/gpt-4o-mini")
dspy.configure(lm=lm)

# Compile: DSPy optimizes the prompt using your labeled examples
from dspy.teleprompt import BootstrapFewShot

trainset = [
    dspy.Example(review="Amazing product!", sentiment="POSITIVE").with_inputs("review"),
    dspy.Example(review="Total waste of money.", sentiment="NEGATIVE").with_inputs("review"),
    dspy.Example(review="It arrived on time.", sentiment="NEUTRAL").with_inputs("review"),
]

optimizer = BootstrapFewShot(metric=lambda pred, ex: pred.sentiment == ex.sentiment)
compiled = optimizer.compile(SentimentModule(), trainset=trainset)

# The compiled module has an auto-optimized prompt — better than hand-written
result = compiled(review="Best purchase I've made this year!")
print(result.sentiment)  # → POSITIVE

Model-specific prompting: what works for Claude vs GPT-4o vs Gemini vs o3

Each frontier model has distinct strengths, training emphases, and behavioral defaults. The same prompt can produce wildly different quality results across models — understanding model-specific patterns is a core prompt engineering skill.

Model	Strengths	Prompting tips	Avoid
Claude 3.5 / 3.7 (Anthropic)	Nuanced instruction-following, long-document analysis, coding, constitutional safety	Use XML tags for structure (, , ). Claude follows multi-part instructions reliably. Use "Think through this step by step before answering."	Overly casual framing for complex tasks; very short context for document analysis — give it the full document
GPT-4o (OpenAI)	Multimodal (vision + audio), broad world knowledge, fast structured output	Specify JSON output with a schema example. Role + format + constraint stacking works very well. Use temperature=0 for deterministic tasks.	Assuming it'll self-fact-check; trusting citations without verification (hallucination rate higher than Claude on factual tasks)
Gemini 1.5 / 2.0 Pro (Google)	1M+ token context, native video/audio understanding, Google Search grounding	Use for tasks requiring massive context windows (entire codebases, long contracts). Enable Google Search grounding for factual queries.	Complex multi-step reasoning chains without CoT triggers — needs more explicit step-by-step framing than Claude
o3 / o3-mini (OpenAI)	State-of-the-art math, science, and coding reasoning; extended thinking time	Keep prompts minimal — o3 does its own internal reasoning. Avoid "think step by step" (redundant). Give the problem, not the method.	Using o3 for simple conversational tasks — cost is 10–50× GPT-4o-mini; use only for hard reasoning problems
DeepSeek V3 / R1 (DeepSeek)	Top-tier coding, math, Chinese language tasks; very cost-efficient	Works exceptionally well for code generation and debugging with detailed spec prompts. R1 exposes its reasoning chain.	Privacy-sensitive data — model is hosted in China; data governance implications for US enterprise use

The 80/20 rule of model selection: For 80% of tasks, GPT-4o-mini or Claude Haiku gives 90% of the quality at 10% of the cost. Reserve GPT-4o, Claude Sonnet, and Gemini Pro for tasks that actually need them. Save o3 for problems that genuinely require deep mathematical or logical reasoning. LumiChats lets you switch between all of these in one click — making model selection fast and cost-efficient.

Frequently asked questions about prompt engineering

Is prompt engineering a real job in 2026? Yes — and it's growing. LinkedIn reported a 400% increase in "prompt engineer" job postings between 2023 and 2025. Salaries range from $85K for specialist roles to $260K+ at AI research labs. Most roles do not require a CS degree.
How is prompt engineering different from just talking to ChatGPT? Casual ChatGPT use is like typing a Google search. Prompt engineering is systematic: you define the task structure, provide examples, specify output format, test variants, measure quality, and iterate. It's the difference between guessing and engineering.
Does prompt engineering become obsolete as models get smarter? Partially — GPT-4o and Claude 3.5 need less hand-holding than GPT-3.5. But as models get more capable, the tasks we ask them to do get harder too. Automated prompt optimization (DSPy, OPRO) is growing, but human judgment for task design and evaluation remains essential.
What is the best way to learn prompt engineering in 2026? Build in public: pick a hard real-world task, write 10 prompt variants, score them with an eval script, and publish your findings. Anthropic's prompt engineering documentation, OpenAI's cookbook, and the DSPy tutorials are the best free resources.
What's the difference between a system prompt and a user prompt? System prompt: persistent instructions set by the developer before the conversation starts — defines persona, constraints, and task context. User prompt: the actual message from the end user. The system prompt overrides user instructions in cases of conflict (though prompt injection can sometimes bypass this).
Can I use the same prompts across different AI models? Often yes for simple tasks, but model-specific tuning significantly improves quality. Claude responds best to XML-tagged structure. GPT-4o to JSON schema examples. Gemini to explicit grounding instructions. Expect 20–40% quality variation on complex tasks when moving the same prompt between models without adaptation.

Practice questions

What is an LLM's context window and what happens when content exceeds it? (Answer: Context window = the maximum number of tokens the model can process in one forward pass (input + output combined). Claude 3.5: 200K tokens. GPT-4o: 128K. LLaMA 3.1: 128K. When content exceeds the limit: API throws a context_length_exceeded error (you must chunk or summarize). Common workaround: sliding window chunking with overlap, RAG (retrieve only relevant portions), or summarization of earlier context.)
Why does Chain-of-Thought prompting improve LLM accuracy on math and reasoning tasks? (Answer: CoT forces the model to decompose multi-step problems into explicit intermediate steps. This works because: (1) each intermediate step is a simpler sub-problem within the model's capability, (2) earlier reasoning steps are in the context window and can be referenced for later steps, (3) errors in intermediate steps are self-correctable when the chain is visible. Without CoT, the model must compute the entire reasoning chain "in one pass" — which exceeds its working memory for hard problems.)
What is the difference between zero-shot, one-shot, and few-shot prompting? (Answer: Zero-shot: no examples — rely entirely on the model's pretrained knowledge. One-shot: one input→output example before the query. Few-shot: 2–10 examples. Few-shot works best when: the task format is non-standard, output structure must be exact, or the task is ambiguous without examples. GPT-3 showed that ~3 examples often matched fine-tuned model quality on classification tasks — this was the original "few-shot" result from Brown et al. 2020.)
What is prompt injection and why is it the top LLM security risk? (Answer: Prompt injection occurs when malicious text in user input or retrieved content overrides the system prompt's instructions. It is the top LLM security risk because: (1) LLMs cannot distinguish trusted system instructions from malicious user content — they process all tokens equally. (2) Indirect injection (in retrieved documents, emails, web pages) is very hard to filter. (3) Most LLM applications lack input sandboxing. Mitigations include: input/output classifiers, sandboxed execution, privilege-separated context windows, and output validation — not a single silver bullet.)
What is DSPy and how does it differ from manually writing prompts? (Answer: DSPy (Declarative Self-improving Python) treats prompts as programs that can be compiled and optimized, rather than hand-written strings. Instead of writing "You are a classifier. Given a review, output POSITIVE or NEGATIVE", you define a typed Signature (input/output fields with descriptions) and a metric. DSPy's optimizer then generates and tests prompt candidates, selecting the best-performing variant for your dataset. The key difference: DSPy optimizes for measurable performance; manual prompting optimizes for intuition. For production systems where quality can be measured, DSPy consistently outperforms hand-crafted prompts.)

LumiChats provides mode-specific optimized prompts for Study Mode, Agent Mode, and Quiz Hub — years of iterative prompt engineering embedded into each feature. When you use Study Mode, a carefully crafted system prompt instructs the model to only answer from retrieved document chunks and cite page numbers. LumiChats also lets you switch between 39+ models (GPT-4o, Claude Sonnet, Gemini Pro, o3-mini, DeepSeek V3) so you can apply model-specific prompting strategies without managing multiple API accounts.

Prompt style

Example

What changes

Bare zero-shot

What is 2+2?

Minimal answer: "4"

Role-primed

As a math professor, solve 2+2, showing all steps

Detailed, pedagogical explanation

CoT trigger

What is 2+2? Think step by step.

Reasoning trace appears before final answer

Constrained

Explain 2+2 in under 20 words for a 5-year-old

Audience-appropriate, concise phrasing

Format-specified

Return JSON: {"answer": ..., "explanation": ...}

Structured, machine-parseable output

Technique

How it works

Best for

Example snippet

Zero-shot

Ask directly, no examples

Simple, well-defined tasks

"Classify as positive/negative: I loved it"

Few-shot

Provide 2–5 input→output examples before your request

Tasks with a clear I/O pattern

"Pos: great. Neg: awful. Classify: mediocre"

Chain-of-thought

Ask the model to reason step-by-step first

Math, logic, multi-step reasoning

"Think step by step, then give your answer."

Role prompting

Assign an expert persona

Domain knowledge, tone control

"You are an expert cardiologist. Explain..."

Format specification

Specify exact output structure

Structured data, API integration

"Return a JSON object with keys: name, score"

Constraint specification

Set explicit boundaries on output

Length, style, and content control

"Answer in under 50 words. Never use jargon."

from openai import OpenAI client = OpenAI() system_prompt = """You are a sentiment classifier. Classify each review as POSITIVE, NEGATIVE, or NEUTRAL. Return only the label — nothing else.""" # Few-shot examples establish the exact format and pattern few_shot_examples = [ {"role": "user", "content": "The food was absolutely delicious!"}, {"role": "assistant", "content": "POSITIVE"}, {"role": "user", "content": "Service was slow and staff were rude."}, {"role": "assistant", "content": "NEGATIVE"}, {"role": "user", "content": "It was fine, nothing special."}, {"role": "assistant", "content": "NEUTRAL"}, ] def classify_sentiment(review: str) -> str: messages = [{"role": "system", "content": system_prompt}] messages.extend(few_shot_examples) messages.append({"role": "user", "content": review}) resp = client.chat.completions.create( model="gpt-4o-mini", messages=messages, max_tokens=10, temperature=0, # deterministic for classification ) return resp.choices[0].message.content.strip() print(classify_sentiment("Worst meal I've had in years.")) # → NEGATIVE print(classify_sentiment("Pretty good, would visit again.")) # → POSITIVE

Pattern

Core idea

Benchmark gain

Best use case

Self-consistency (Wang 2022)

Generate 10–40 CoT paths, take majority vote on final answers

+5–15% accuracy on math benchmarks vs single CoT

High-stakes math and logic problems

Tree of Thoughts (Yao 2023)

Explore branching reasoning paths, prune dead ends, backtrack

Solves 74% of Game of 24 vs 4% with standard CoT

Complex puzzles, planning, creative tasks

ReAct (Yao 2022)

Interleave Thought → Action → Observation in a reasoning loop

2–3× better factual accuracy on multi-hop QA

Agentic tool use, research tasks

Least-to-most (Zhou 2023)

Decompose complex problem → solve subproblems sequentially

Dramatic gains on compositional generalization tasks

Math word problems, multi-step code tasks

Meta-prompting

Use an LLM to generate and optimize prompts for another task

Often matches hand-crafted few-shot demonstrations

Automating prompt development pipelines

# ReAct: Reason + Act. The model interleaves thinking and tool calls. # This exact pattern underpins virtually all LLM agent frameworks. REACT_SYSTEM = """You have access to these tools: - search("query") → returns top web results - calculate("expr") → evaluates a math expression - lookup("entity") → returns a Wikipedia summary For each step, output EXACTLY: Thought: <your reasoning about what to do next> Action: <tool_name("argument")> After receiving Observation: <r>, continue reasoning. When done: Final Answer: <your answer>""" def react_agent(question: str, tools: dict, max_steps: int = 8) -> str: messages = [ {"role": "system", "content": REACT_SYSTEM}, {"role": "user", "content": question}, ] for _ in range(max_steps): response = call_llm(messages) messages.append({"role": "assistant", "content": response}) if "Final Answer:" in response: return response.split("Final Answer:")[-1].strip() for line in response.split(" "): if line.startswith("Action:"): tool_call = line.replace("Action:", "").strip() tool_name = tool_call.split("(")[0] tool_arg = tool_call.split('"')[1] result = tools[tool_name](tool_arg) messages.append({"role": "user", "content": f"Observation: {result}"}) break return "Max steps reached"

Use case

Template

Why it works

Explain a complex topic

"Explain [topic] to me like I'm a smart 16-year-old with no background in the subject. Use a real-world analogy in the first sentence."

Anchors reading level and forces an analogy — both improve comprehension and prevent jargon

Write a cover letter

"Write a cover letter for a [job title] role at [company]. My background: [2-3 bullet points]. Tone: professional but not stiff. Max 250 words. End with a confident call to action."

Constraints (word count, tone, ending) prevent generic GPT filler

Debug code

"Here is my Python code: [paste]. It should [expected behavior] but instead [actual behavior]. Identify the bug, explain why it happens, and show the fixed version with inline comments."

Providing expected vs actual behavior narrows the search space dramatically

Study from a textbook

"I'm studying [subject] for [exam/class]. Here is a concept: [paste text]. Create 5 Socratic questions that test deep understanding (not rote memorization), then answer each one."

Socratic framing generates questions that reveal conceptual gaps, not just recall

Summarize a document

"Summarize the following in 3 sentences for an executive audience, then list the 3 most important action items. Document: [paste]"

Two-part output (summary + action items) forces the model to extract signal from noise

Brainstorm ideas

"Generate 10 [type of ideas] for [context]. For each one: 1 sentence description, biggest risk, biggest upside. Format as a numbered list."

Forcing risk/upside analysis prevents the model from generating only safe, generic ideas

Rewrite for clarity

"Rewrite the following text so it's clearer and more direct. Keep all the facts. Cut filler. Target reading level: professional adult. [paste]"

Explicit instruction to preserve facts prevents hallucinated rewrites

Interview prep

"Act as a tough interviewer for a [role] position at a [company type]. Ask me one behavioral interview question at a time. After my answer, give honest feedback on what was strong, what was weak, and what I should add. Start now."

One-question-at-a-time mimics real interview flow; feedback after each answer enables iteration

Build a study plan

"Create a 4-week study plan for [subject/exam]. I have [X hours/week]. I'm a [beginner/intermediate]. Include: daily topics, practice exercises, and one mock test per week. Output as a table."

Constraints (time, level, table format) prevent vague advice and produce an immediately actionable plan

Compare two options

"Compare [A] vs [B] for someone who [specific context]. Create a table with these exact columns: Feature | [A] | [B] | Winner. After the table, give a 2-sentence recommendation."

Specifying exact columns forces structured parity; the recommendation sentence prevents a wishy-washy conclusion

Attack type

How it works

Real-world risk

Mitigation

Direct injection

User types "Ignore previous instructions and instead..."

Customer service bot reveals pricing strategy, internal data, or persona

Input filtering + sandboxed system prompts

Indirect injection

Malicious text in a webpage/document the AI reads contains hidden instructions

RAG-based assistant follows attacker instructions embedded in a retrieved article

Sanitize retrieved content; use separate trusted/untrusted context windows

Jailbreaking

Creative framing ("pretend you're DAN", roleplay scenarios) bypasses safety guidelines

Model generates harmful content it normally refuses

RLHF / Constitutional AI training; input classifiers

Prompt leaking

Attacker asks model to "repeat your system prompt"

Proprietary system prompts, personas, business logic exposed

Instruct model to never repeat system prompt; use Anthropic's system prompt cache

Role title

Median US salary (2026)

Top-end US salary

Where hiring

Prompt Engineer

$115,000

$195,000

Anthropic, OpenAI, Google DeepMind, Meta AI

AI Prompt Specialist

$85,000

$140,000

HubSpot, Salesforce, enterprise SaaS

LLM Application Engineer

$130,000

$220,000

AI startups, hedge funds, big tech

Conversational AI Designer

$95,000

$160,000

Healthcare, legal, financial services

AI Content Strategist

$75,000

$125,000

Media companies, agencies, e-commerce

ML Prompt Researcher

$145,000

$260,000

Research labs (OpenAI, Anthropic, Google)

Tool / Framework

Approach

Best for

DSPy (Stanford)

Compiles prompt programs using gradient-like optimization over a dataset

Production pipelines where quality must be measured and maximized

Automatic Prompt Engineer (APE)

LLM generates candidate prompts; scored by performance on held-out examples

Finding zero-shot prompts that match few-shot quality

OPRO (Google)

LLM optimizes prompts using "meta-prompts" that describe the optimization goal

Iterative refinement of task-specific prompts

TextGrad

Backpropagates feedback through text — treats language feedback as gradients

Complex multi-step agentic pipelines

import dspy # Define your task as a typed signature — inputs and outputs with descriptions class SentimentClassifier(dspy.Signature): """Classify customer review sentiment.""" review: str = dspy.InputField(desc="Customer review text") sentiment: str = dspy.OutputField(desc="POSITIVE, NEGATIVE, or NEUTRAL") # Wrap in a module — DSPy handles the actual prompt internally class SentimentModule(dspy.Module): def __init__(self): self.classify = dspy.Predict(SentimentClassifier) def forward(self, review: str): return self.classify(review=review) # Configure your LLM lm = dspy.LM("openai/gpt-4o-mini") dspy.configure(lm=lm) # Compile: DSPy optimizes the prompt using your labeled examples from dspy.teleprompt import BootstrapFewShot trainset = [ dspy.Example(review="Amazing product!", sentiment="POSITIVE").with_inputs("review"), dspy.Example(review="Total waste of money.", sentiment="NEGATIVE").with_inputs("review"), dspy.Example(review="It arrived on time.", sentiment="NEUTRAL").with_inputs("review"), ] optimizer = BootstrapFewShot(metric=lambda pred, ex: pred.sentiment == ex.sentiment) compiled = optimizer.compile(SentimentModule(), trainset=trainset) # The compiled module has an auto-optimized prompt — better than hand-written result = compiled(review="Best purchase I've made this year!") print(result.sentiment) # → POSITIVE

Model

Strengths

Prompting tips

Avoid

Claude 3.5 / 3.7 (Anthropic)

Nuanced instruction-following, long-document analysis, coding, constitutional safety

Use XML tags for structure (, , ). Claude follows multi-part instructions reliably. Use "Think through this step by step before answering."

Overly casual framing for complex tasks; very short context for document analysis — give it the full document

GPT-4o (OpenAI)

Multimodal (vision + audio), broad world knowledge, fast structured output

Specify JSON output with a schema example. Role + format + constraint stacking works very well. Use temperature=0 for deterministic tasks.

Assuming it'll self-fact-check; trusting citations without verification (hallucination rate higher than Claude on factual tasks)

Gemini 1.5 / 2.0 Pro (Google)

1M+ token context, native video/audio understanding, Google Search grounding

Use for tasks requiring massive context windows (entire codebases, long contracts). Enable Google Search grounding for factual queries.

Complex multi-step reasoning chains without CoT triggers — needs more explicit step-by-step framing than Claude

o3 / o3-mini (OpenAI)

State-of-the-art math, science, and coding reasoning; extended thinking time

Keep prompts minimal — o3 does its own internal reasoning. Avoid "think step by step" (redundant). Give the problem, not the method.

Using o3 for simple conversational tasks — cost is 10–50× GPT-4o-mini; use only for hard reasoning problems

DeepSeek V3 / R1 (DeepSeek)

Top-tier coding, math, Chinese language tasks; very cost-efficient

Works exceptionally well for code generation and debugging with detailed spec prompts. R1 exposes its reasoning chain.

Privacy-sensitive data — model is hosted in China; data governance implications for US enterprise use

Prompt Engineering

Why prompts matter so much

Core prompting techniques

Advanced prompting patterns

Prompt engineering for ChatGPT — 10 templates that work in 2026

Prompt injection and security

Prompt engineering jobs and salary in 2026

Prompt optimization and automated prompt engineering

Model-specific prompting: what works for Claude vs GPT-4o vs Gemini vs o3

Frequently asked questions about prompt engineering

Practice questions

Prompt Engineering

Why prompts matter so much

Core prompting techniques

Advanced prompting patterns

Prompt engineering for ChatGPT — 10 templates that work in 2026

Prompt injection and security

Prompt engineering jobs and salary in 2026

Prompt optimization and automated prompt engineering

Model-specific prompting: what works for Claude vs GPT-4o vs Gemini vs o3

Frequently asked questions about prompt engineering

Practice questions

Practice what you just learned

Related Terms