Reasoning Models (o1, o3, R1)

Reasoning models are a new class of large language models trained to perform extended chain-of-thought reasoning before producing a final answer. OpenAI's o1 (September 2024) was the first widely deployed reasoning model — it scored 83% on the 2024 International Mathematics Olympiad qualifying exam, compared to 13% for GPT-4o. DeepSeek R1 (January 2025) replicated o1-level performance as an open-source model, setting off a wave of reasoning model development across the industry.

AI that thinks before it answers — and scores 99th percentile on math competitions.

Category: Flagship AI Models

How reasoning models are trained: GRPO and process reward models

Standard LLMs are trained to predict the next token. Reasoning models are trained with reinforcement learning to maximize the correctness of final answers — the model learns to use its context window as a scratchpad. OpenAI uses a proprietary training process; DeepSeek R1 uses Group Relative Policy Optimization (GRPO), which eliminates the need for a separate critic model by using the average reward within a group of generated responses as the baseline.

J_{\text{GRPO}}(\theta) = \mathbb{E}\left[\frac{1}{G}\sum_{i=1}^{G} \min\left(r_i(\theta) \hat{A}_i,\ \text{clip}(r_i(\theta), 1-\varepsilon, 1+\varepsilon)\hat{A}_i\right)\right]

Model	Creator	AIME 2024	MATH-500	SWE-Bench	Open?
o1	OpenAI	74.4%	96.4%	48.9%	No
o3 mini	OpenAI	90.0%	97.9%	49.3%	No
DeepSeek R1	DeepSeek	79.8%	97.3%	49.2%	Yes
Claude 3.7 (thinking)	Anthropic	~80%	~97%	70.3%	No
Gemini 2.5 Pro	Google	92.0%	97.9%	Unreported	No

When to use a reasoning model vs a standard model

Use reasoning models for: math problems, formal proofs, multi-step coding tasks, complex logic puzzles, scientific analysis
Use standard models for: writing, summarization, simple Q&A, translation, classification — tasks where extended thinking wastes time and money
Reasoning models are 5–20x more expensive and 5–10x slower than equivalent standard models
The 'thinking' tokens are often not shown to users but count toward your token bill

Practical rule: If a task could be solved by a smart person in 30 seconds, use a standard model. If it would take a PhD student 30 minutes of focused work, use a reasoning model.

Model

Creator

AIME 2024

MATH-500

SWE-Bench

Open?

OpenAI

74.4%

96.4%

48.9%

o3 mini

OpenAI

90.0%

97.9%

49.3%

DeepSeek R1

DeepSeek

79.8%

97.3%

49.2%

Yes

Claude 3.7 (thinking)

Anthropic

~80%

~97%

70.3%

Gemini 2.5 Pro

Google

92.0%

97.9%

Unreported

Reasoning Models (o1, o3, R1)

How reasoning models are trained: GRPO and process reward models

When to use a reasoning model vs a standard model

Reasoning Models (o1, o3, R1)

How reasoning models are trained: GRPO and process reward models

When to use a reasoning model vs a standard model

Practice what you just learned

Related Terms