What is practice questions?

World Models: Practice questions. Learn more in the LumiChats AI Glossary at https://lumichats.com/glossary/world-models

World Models

World models are AI systems that learn internal representations of how the physical world works — predicting the next state of an environment given actions within it, rather than predicting the next token in a text sequence. While LLMs model the statistical patterns of language, world models model causality, physics, spatial relationships, and object permanence. In late 2025 and early 2026, world models emerged as the field's most hyped new frontier: Yann LeCun left Meta to launch AMI Labs (seeking €3B valuation), Fei-Fei Li's World Labs shipped Marble, Google DeepMind released Genie 3, and Nvidia's Cosmos platform surpassed 2 million downloads.

AI that understands physics and reality — not just words.

Category: AI Fundamentals

The fundamental difference: tokens vs. states

The core distinction between LLMs and world models is what they predict:

Property	Large Language Model	World Model
What it predicts	The next token in a text sequence	The next state of an environment given an action
Learning signal	Statistical co-occurrence of words across text	Causal dynamics — what happens when you push this object
Representation space	Token embeddings in high-dimensional language space	Latent representations of physical state
Understanding of physics	None — describes physics accurately without "feeling" it	Built-in — trained on video and sensor data of real physical interactions
Hallucinations	Common — predicts plausible-sounding text, not grounded truth	Rarer — grounded in physical observations, not statistical text patterns
Best analogy	Extremely well-read librarian who has read every physics textbook	A child who has played with blocks, water, and gravity for years

LeCun's critique of LLMs: Yann LeCun has argued publicly for years that LLMs will never achieve general intelligence: "They predict the next word based on statistics, not the next state of the world based on physics." When GPT-4 generates text about a ball rolling down a hill, it is not simulating physics — it is predicting which words typically follow other words. It has no internal model of gravity, friction, or momentum. World models are designed to close this gap.

The 2026 world models race

In the span of a few months bridging late 2025 and early 2026, world models went from a niche research topic to the industry's most-funded frontier:

Player	Product / Project	Key milestone	Valuation / Investment
AMI Labs (Yann LeCun)	JEPA-based world models	LeCun left Meta (Dec 2025) to found AMI; builds on V-JEPA 2 trained on 1M+ hours of video	€3B valuation pre-product; offices in Paris, NYC, Montreal, Singapore
World Labs (Fei-Fei Li)	Marble	Ships Marble (Nov 2025) — generates navigable 3D worlds from text/images/video; users can move through and interact with generated environments	$5B valuation in talks; $230M seed raised in 2024
Google DeepMind	Genie 3 / Project Genie	First real-time interactive world model; generates navigable 3D worlds at 24fps from text prompts; paired with SIMA 2 agent for in-world training	Part of DeepMind (Alphabet)
Nvidia	Cosmos platform	Trained on 20M hours of real-world data; 2M+ downloads; three model families (Predict, Transfer, Reason); key infrastructure for robotics AI	$100B+ market cap acceleration from AI adoption
Runway	GWM-1 World Model	First world model from a creative AI company; released Dec 2025; targets robotics and gaming beyond its traditional media/VFX market	Est. $4B valuation

JEPA — LeCun's architecture: AMI Labs is built on Joint Embedding Predictive Architecture (JEPA), developed at Meta. Unlike LLMs that process tokens, JEPA-based models operate in abstract latent spaces and predict how the state of the world changes in response to actions. The key insight: predict in representation space, not pixel space — this avoids the exponential complexity of modeling every visual detail, focusing instead on the semantically meaningful changes.

Why world models matter — real applications

Application	How world models help	Who is doing it
Robotics training	Generate infinite simulated environments for robot training without physical hardware; simulate rare or dangerous scenarios safely	Figure AI, Agility Robotics, 1X — all using Nvidia Cosmos
Autonomous vehicles	Simulate rare edge cases (ice, accidents, unusual pedestrian behavior) that are dangerous or rare in real-world data collection	Waymo, Wayve (GAIA-2 model), Uber, XPENG using Cosmos
Video game development	Generate reactive, physically consistent 3D game worlds from text; procedural generation with real physics	Google Project Genie demos, Iconic AI-native game engine
AR / VR / Spatial computing	Maintain coherent 4D (3D + time) models of the user's environment for stable AR overlays; predict object movement	Apple Vision Pro content pipelines, Meta Orion research
Scientific simulation	Simulate protein folding dynamics, fluid dynamics, material properties — with faster-than-physics-engine speed	DeepMind AlphaFold successors, Runway scientific models
Medical / surgical AI	Simulate surgical procedures; train surgical robots without human patients; predict treatment outcomes in 3D	AMI Labs / Nabla partnership focus area

For students: where to start: World models are a frontier research area — most of the best work is in papers, not products. Start with: (1) DreamerV3 (Hafner et al., 2025) — the most complete open-source world model for RL tasks; (2) Nvidia Cosmos — download and experiment with the open models; (3) Genie 3 technical report from DeepMind; (4) LeCun's 2022 position paper "A Path Toward Autonomous Machine Intelligence" (available free) — the theoretical blueprint for everything AMI Labs is building.

Practice questions

What is the difference between a model-free and model-based reinforcement learning agent? (Answer: Model-free: learns a policy (what to do) or value function (how good is each state) directly from experience, without modeling the environment dynamics. Simple but sample-inefficient — needs many environment interactions. Model-based: explicitly learns a transition model P(s' | s, a) (what happens when action a is taken in state s). Can plan by simulating future trajectories without real environment interaction. Sample-efficient but requires accurate world model. World models aim to give RL agents model-based efficiency.)
What is DreamerV3 and how does it use a world model? (Answer: DreamerV3 (Hafner 2023): learns a compact world model in latent space — a Recurrent State Space Model (RSSM) that predicts latent states from current latent state and action. The agent is trained ENTIRELY within imagined rollouts from this world model — never directly interacting with the real environment during policy training. Environment interaction only updates the world model. This enables DreamerV3 to master diverse tasks (Minecraft, robot locomotion, classic games) with orders of magnitude fewer real environment steps than model-free RL.)
Why are world models important for safety in autonomous systems? (Answer: An autonomous car without a world model must learn purely from real experience — including crashes. A car with a world model can: simulate thousands of dangerous scenarios internally without real risk. Test 'what if I miss the red light' in simulation before ever encountering it. Plan by rolling out multiple potential future trajectories and choosing the safest. Predict other agents' behaviors. Real-world failures are catastrophic; a world model allows safety-critical scenarios to be explored in imagination.)
How does the concept of a 'mental model' in cognitive science relate to AI world models? (Answer: Cognitive science: humans maintain mental models of physics, social relationships, causality, and others' mental states. We plan actions by mentally simulating their consequences. Johnson-Laird (1983): mental models are the basis of reasoning and language understanding. AI world models operationalise this: a neural network that represents environment dynamics enables planning by simulation. The connection is deep — both biological and artificial agents that model their environment before acting are more adaptive and efficient than reactive systems.)
What is a 'latent space world model' and why is it more efficient than pixel-space models? (Answer: Pixel-space world model: learns to predict future video frames at full pixel resolution — computationally expensive (high-dimensional output, each step generates thousands of pixels). Latent space world model: compress observation to compact latent representation via VAE/encoder, model dynamics in latent space (small vectors), decode only for visualization. DreamerV3's RSSM models 32-dimensional latent states. Planning and policy learning happen in this compact space — 100-1000× fewer computations than pixel-space modeling.)

Property

Large Language Model

World Model

What it predicts

The next token in a text sequence

The next state of an environment given an action

Learning signal

Statistical co-occurrence of words across text

Causal dynamics — what happens when you push this object

Representation space

Token embeddings in high-dimensional language space

Latent representations of physical state

Understanding of physics

None — describes physics accurately without "feeling" it

Built-in — trained on video and sensor data of real physical interactions

Hallucinations

Common — predicts plausible-sounding text, not grounded truth

Rarer — grounded in physical observations, not statistical text patterns

Best analogy

Extremely well-read librarian who has read every physics textbook

A child who has played with blocks, water, and gravity for years

Player

Product / Project

Key milestone

Valuation / Investment

AMI Labs (Yann LeCun)

JEPA-based world models

LeCun left Meta (Dec 2025) to found AMI; builds on V-JEPA 2 trained on 1M+ hours of video

€3B valuation pre-product; offices in Paris, NYC, Montreal, Singapore

World Labs (Fei-Fei Li)

Marble

Ships Marble (Nov 2025) — generates navigable 3D worlds from text/images/video; users can move through and interact with generated environments

$5B valuation in talks; $230M seed raised in 2024

Google DeepMind

Genie 3 / Project Genie

First real-time interactive world model; generates navigable 3D worlds at 24fps from text prompts; paired with SIMA 2 agent for in-world training

Part of DeepMind (Alphabet)

Nvidia

Cosmos platform

Trained on 20M hours of real-world data; 2M+ downloads; three model families (Predict, Transfer, Reason); key infrastructure for robotics AI

$100B+ market cap acceleration from AI adoption

Runway

GWM-1 World Model

First world model from a creative AI company; released Dec 2025; targets robotics and gaming beyond its traditional media/VFX market

Est. $4B valuation

Application

How world models help

Who is doing it

Robotics training

Generate infinite simulated environments for robot training without physical hardware; simulate rare or dangerous scenarios safely

Figure AI, Agility Robotics, 1X — all using Nvidia Cosmos

Autonomous vehicles

Simulate rare edge cases (ice, accidents, unusual pedestrian behavior) that are dangerous or rare in real-world data collection

Waymo, Wayve (GAIA-2 model), Uber, XPENG using Cosmos

Video game development

Generate reactive, physically consistent 3D game worlds from text; procedural generation with real physics

Google Project Genie demos, Iconic AI-native game engine

AR / VR / Spatial computing

Maintain coherent 4D (3D + time) models of the user's environment for stable AR overlays; predict object movement

Apple Vision Pro content pipelines, Meta Orion research

Scientific simulation

Simulate protein folding dynamics, fluid dynamics, material properties — with faster-than-physics-engine speed

DeepMind AlphaFold successors, Runway scientific models

Medical / surgical AI

Simulate surgical procedures; train surgical robots without human patients; predict treatment outcomes in 3D

AMI Labs / Nabla partnership focus area

World Models

The fundamental difference: tokens vs. states

The 2026 world models race

Why world models matter — real applications

Practice questions

World Models

The fundamental difference: tokens vs. states

The 2026 world models race

Why world models matter — real applications

Practice questions

Practice what you just learned

Related Terms