AI Model Library

42 models.
One platform.

The most comprehensive collection of frontier and open-source AI models — from GPT-5 and Claude 4 to DeepSeek R1, Qwen3, Gemma 3, and beyond. Researched, documented, and accessible to everyone.

9 Premium models33 Free models12 Reasoning models9 Multimodal models3 Coding specialists

42 models shown · Click any card to expand full details

Lumichats
Best Model (Auto-routing)
Lumichats
Featured
Auto
Auto
THINKctx: Varies
Released 2025

LumiChats' intelligent auto-routing layer that automatically selects the best available model for each specific query. Powered by OpenRouter's routing infrastructure, it analyses your request type — coding, reasoning, creative writing, or general conversation — and dispatches it to the model

OpenAI
GPT-5.2
OpenAI
Premium
Premium
THINKctx: 128K
Released 2025

OpenAI's GPT-5 generation model offering advanced reasoning, strong structured output, and reliable instruction following. Part of the GPT-5 family designed for complex, multi-step tasks requiring deep contextual understanding. Excels at nuanced analysis, systematic problem-solving, and synthesising information from long documents.

Anthropic
Claude Sonnet 4.5
Anthropic
Premium
Premium
THINKctx: 200K
Released Sep 2025

Anthropic's Claude Sonnet 4.5 is a high-capability model in the Claude 4 family, balancing intelligence and speed. It features extended thinking (chain-of-thought reasoning) and excels at long-document analysis with its 200K-token context window. Anthropic trained Claude with a strong emphasis

Anthropic
Claude Sonnet 4.6
Anthropic
Featured
Premium
Premium
THINKctx: 200K
Released 2025

Claude Sonnet 4.6 is Anthropic's latest smart, efficient model designed for everyday professional use. It inherits the Claude 4 family's 200K-token context window and extended thinking capability, making it ideal for handling complex documents and multi-step reasoning chains. Sonnet 4.6

Anthropic
Claude Haiku 4.5
Anthropic
Premium
Premium
ctx: 200K
Released Oct 2025

Claude Haiku 4.5 is Anthropic's fastest and most compact Claude 4-family model. Despite being lightweight, it still features the 200K-token context window inherited from the Claude 4 architecture and is optimised for low-latency applications. It's the right choice when you

OpenAI
GPT-5.3-Codex
OpenAI
Premium
Premium
THINKctx: 128K
Released 2025

OpenAI's GPT-5.3 Codex is a coding-specialised variant of the GPT-5.3 family, built for agentic software engineering tasks. Following in the tradition of the original Codex that powered GitHub Copilot, GPT-5.3-Codex is optimised for multi-file code generation, repository-level understanding, automated debugging,

Google
Gemini 2.5
Google
Featured
Premium
Premium
THINKctx: 1M
Released 2025

Google's Gemini 2.5 Pro is one of the world's most capable multimodal reasoning models, featuring a 1-million-token context window that can process entire books, long video transcripts, or massive codebases in a single pass. It achieves frontier performance on AIME,

Google
Gemini 3 Flash Preview
Google
Premium
Premium
THINKctx: 1M
Released 2025

Google's Gemini 3 Flash Preview is an early access version of the Gemini 3 Flash model — a smaller, faster sibling to Gemini 3 Pro designed for high-throughput applications. It retains Gemini's signature 1M-token context window and multimodal capabilities, while

xAI
Grok 4.1 Fast
xAI
Premium
Premium
THINKctx: 128K
Released Nov 2025

xAI's Grok 4.1 Fast is a speed-optimised variant of Grok 4, Elon Musk's AI lab's flagship large language model. Grok is designed with minimal content restrictions and a direct, unfiltered personality — making it popular for candid conversations and tasks

NVIDIA
Nemotron Nano 12B 2 VL
NVIDIA
Multimodal
OPENctx: 128K
12B paramsReleased Oct 2025

NVIDIA's Nemotron Nano 12B v2 VL is a compact but highly capable open-source vision-language model built on a hybrid Transformer-Mamba architecture. Trained on over 39M high-quality multimodal samples, it leads benchmarks in OCR (OCRBench v2), document intelligence, chart reasoning, and

Mistral
Mistral Small 3.1 24B
Mistral
Multimodal
OPENctx: 128K
24B paramsReleased Mar 2025

Mistral Small 3.1 24B is Mistral AI's most capable small multimodal model, handling both text and image inputs with a 128K-token context window. It's designed to deliver top-tier performance at the 24B scale — beating larger models on several benchmarks

Google
Gemma 3 4B
Google
Multimodal
OPENctx: 128K
4B paramsReleased Mar 2025

Google's Gemma 3 4B is the entry-level vision-language model in the Gemma 3 family, supporting both text and image inputs with a 128K-token context window. Built on the same research foundation as Gemini 2.0, it's designed to run efficiently on

Google
Gemma 3 12B
Google
Multimodal
OPENctx: 128K
12B paramsReleased Mar 2025

Google's Gemma 3 12B strikes a strong balance between capability and deployment practicality. Part of the Gemma 3 multimodal family, it handles text and image inputs with a 128K context window and was trained on 12 trillion tokens. It offers

Google
Gemma 3 27B
Google
Featured
Multimodal
OPENctx: 128K
27B paramsReleased Mar 2025

Google's Gemma 3 27B is the flagship of the Gemma 3 family and one of the best open-source models globally at its size. It ranked in the top 10 of the LMSYS Chatbot Arena with an Elo score of 1338–1339,

Google
Gemini 2.0 Flash Experimental
Google
Multimodal
ctx: 1M
Released Dec 2024

Google's Gemini 2.0 Flash Experimental is a free experimental release showcasing capabilities from the Gemini 2.0 generation — a model designed to be natively multimodal and agentic. It processes text, images, audio, and video, with a 1M-token context window for

Qwen
Qwen2.5-VL 7B Instruct
Qwen
Multimodal
OPENctx: 32K
7B paramsReleased Sep 2024

Alibaba's Qwen2.5-VL 7B Instruct is a strong open-source vision-language model at the 7B scale. It supports image and text inputs with a native dynamic resolution mechanism, allowing it to process images at their original resolution rather than downscaling. Qwen2.5-VL 7B

Xiaomi
MiMo-V2-Flash
Xiaomi
Multimodal
OPENctx: 32K
~7B active paramsReleased 2025

Xiaomi's MiMo-V2-Flash is a fast, lightweight multimodal model developed by the Xiaomi AI team. MiMo (Mixture of Modalities) is designed for efficient on-device and cloud inference, combining text and image understanding in a compact architecture. It's optimised for speed-sensitive scenarios

NVIDIA
Nemotron 3 Nano 30B A3B
NVIDIA
Multimodal
OPENctx: 128K
30B / 3B active paramsReleased 2025

NVIDIA's Nemotron 3 Nano 30B A3B is a Mixture-of-Experts model with 30 billion total parameters but only 3.3 billion active parameters per forward pass — enabling very fast inference at low cost. Built on NVIDIA's Nemotron architecture, it's part of

Mistral
Devstral 2 2512
Mistral
Featured
Coding
OPENctx: 256K
123B paramsReleased Dec 2025

Devstral 2 is Mistral AI's state-of-the-art open-source agentic coding model, achieving 72.2% on SWE-bench Verified — one of the highest scores for any open-weight model on this benchmark for real-world GitHub issue resolution. With 123B parameters, a 256K-token context window,

Qwen
Qwen3 Coder 480B A35B
Qwen
Featured
Coding
OPENctx: 262K
480B / 35B active paramsReleased Jul 2025

Qwen3-Coder-480B-A35B-Instruct is Alibaba Cloud's most powerful open agentic coding model. It's a Mixture-of-Experts model with 480 billion total parameters and only 35 billion active per inference pass (8 of 160 experts), making large-scale deployment economically viable. The model natively supports

Kwaipilot
KAT-Coder-Pro V1
Kwaipilot
Coding
ctx: 128K
Released 2025

KAT-Coder-Pro V1 is Kwaipilot's (Kuaishou's AI coding arm) proprietary coding model designed for production software development. Kwaipilot is the AI coding assistant from Kuaishou Technology, the company behind the Kwai short-video platform. The KAT-Coder-Pro model targets real-world developer workflows with

Nex AGI
DeepSeek V3.1 Nex N1
Nex AGI
Reasoning
OPENTHINKctx: 128K
671B / 37B active paramsReleased 2025

DeepSeek V3.1 Nex N1 is Nex AGI's enhanced fine-tune of DeepSeek V3.1, optimised for agentic reasoning tasks. The N1 variant applies additional alignment and instruction-following improvements on top of DeepSeek's frontier-class 671B MoE base, with 37B active parameters per forward

TNG Tech
TNG R1T Chimera
TNG Tech
Reasoning
OPENTHINKctx: 130K
671B / 37B active paramsReleased Apr 2025

The original DeepSeek R1T Chimera from TNG Technology Consulting GmbH (Munich) is an expert-assembly model merging DeepSeek V3-0324 and R1 at the MoE expert tensor level — no fine-tuning or distillation required. The result is a model that achieves approximately

AllenAI
Olmo 3.1 32B Think
AllenAI
Reasoning
OPENTHINKctx: 65K
32B paramsReleased Dec 2025

AllenAI's OLMo 3.1 32B Think is the world's most transparent large-scale reasoning model — every piece of training data, code, intermediate checkpoint, and reasoning trace is publicly available under Apache 2.0. The 3.1 variant extends the original OLMo 3 32B

Alibaba
Tongyi DeepResearch 30B A3B
Alibaba
Reasoning
OPENTHINKctx: 128K
30B / 3B active paramsReleased 2025

Alibaba's Tongyi DeepResearch 30B A3B is a research-oriented reasoning model from the Tongyi (通义) family, trained specifically for in-depth analytical and research tasks. As a 30B MoE model with only 3 billion active parameters, it provides strong reasoning output at

TNG Tech
DeepSeek R1T2 Chimera
TNG Tech
Featured
Reasoning
OPENTHINKctx: 130K
671B / 37B active paramsReleased Jul 2025

DeepSeek-TNG R1T2 Chimera is TNG Technology Consulting's second-generation Assembly-of-Experts model, merging three DeepSeek parents: R1-0528, R1, and V3-0324 at the weight tensor level — no fine-tuning required. The tri-parent design achieves a new sweet spot: approximately 20% faster than standard

TNG Tech
DeepSeek R1T Chimera
TNG Tech
Reasoning
OPENTHINKctx: 60K
671B / 37B active paramsReleased Apr 2025

The original DeepSeek R1T Chimera (April 2025) was TNG's first successful Assembly-of-Experts model merge at 671B scale — the first publicly demonstrated merge of models at this size. By combining DeepSeek V3-0324's shared experts with R1's routed expert tensors, it

Arcee AI
Trinity Mini
Arcee AI
General
OPENctx: 128K
Released 2025

Arcee AI's Trinity Mini is a compact general-purpose model from Arcee's Trinity model family, which specialises in efficient AI for enterprise applications. Arcee AI is known for its model merging and specialisation techniques — the Trinity series uses a mixture-of-models

NVIDIA
Nemotron Nano 9B V2
NVIDIA
General
OPENctx: 128K
9B paramsReleased 2025

NVIDIA's Nemotron Nano 9B V2 is a compact, highly optimised language model using NVIDIA's hybrid Transformer-Mamba architecture. This design delivers higher throughput and lower latency compared to standard attention-only transformers while maintaining competitive reasoning quality. Nemotron Nano 9B V2 achieves

Z.AI
GLM 4.5 Air
Z.AI
General
OPENTHINKctx: 128K
106B / 12B active paramsReleased Jul 2025

Z.AI's GLM-4.5-Air is the lightweight variant of the flagship GLM-4.5 family from Zhipu AI — an agent-native model that unifies reasoning, coding, and tool use in a single architecture. With 106 billion total parameters and only 12 billion active (MoE),

Google
Gemma 3n 2B
Google
General
OPENctx: 32K
2B effective paramsReleased Jun 2025

Google's Gemma 3n E2B is the smallest model in the Gemma 3n (nano) family, designed specifically for mobile, IoT, and on-device AI deployment. Using Google's revolutionary MatFormer (Matryoshka Transformer) architecture, Gemma 3n E2B has a total parameter count of ~5B

Google
Gemma 3n 4B
Google
General
OPENctx: 32K
4B effective paramsReleased Jun 2025

Google's Gemma 3n E4B is the larger model in the Gemma 3n family, targeting high-end mobile devices, laptops, and edge servers. With an effective ~4B memory footprint despite containing more total parameters (MatFormer architecture), it handles text, images, and audio

Qwen
Qwen3 4B
Qwen
General
OPENctx: 128K
4B paramsReleased Apr 2025

Alibaba's Qwen3 4B is the compact member of the Qwen3 family, offering both thinking (chain-of-thought) and non-thinking modes in a tiny 4B parameter footprint. Despite its small size, Qwen3 4B is one of the most capable models in its class

Qwen
Qwen3 235B A22B
Qwen
Featured
General
OPENTHINKctx: 128K
235B / 22B active paramsReleased Apr 2025

Qwen3-235B-A22B is Alibaba's flagship open-source model — a massive 235B MoE model with 22B active parameters per forward pass. It ranks among the best open-weight models globally, achieving top performance on AIME 2025, LiveCodeBench, and multilingual benchmarks. In non-thinking mode

Meta
Llama 3.3 70B Instruct
Meta
Featured
General
OPENctx: 128K
70B paramsReleased Dec 2024

Meta's Llama 3.3 70B Instruct is one of the most widely used open-source LLMs globally, delivering performance comparable to Llama 3.1 405B at a fraction of the compute cost. Trained on 15 trillion tokens and 39.3M GPU hours on NVIDIA

Meta
Llama 3.2 3B Instruct
Meta
General
OPENctx: 128K
3B paramsReleased Sep 2024

Meta's Llama 3.2 3B Instruct is a tiny but capable model from Meta's open-source AI programme, pretrained on 9 trillion tokens and refined via SFT, rejection sampling, and DPO. Using knowledge distillation from larger Llama 3.1 models, it punches above

Nous Research
Hermes 3 405B Instruct
Nous Research
General
OPENctx: 128K
405B paramsReleased Aug 2024

Hermes 3 405B Instruct is Nous Research's flagship fine-tune of Meta's Llama-3.1 405B foundation model — a full-parameter finetune specifically designed to maximise user-alignment, creative flexibility, and agentic capability. Hermes 3 builds on the Hermes 2 series with dramatically improved

DeepSeek
DeepSeek R1 0528
DeepSeek
Featured
General
OPENTHINKctx: 128K
671B / 37B active paramsReleased May 2025

DeepSeek R1-0528 is a significant upgrade to the original DeepSeek R1, achieved through increased training compute and post-training algorithmic improvements — not architectural changes. It doubled the average thinking token depth (12K→23K), boosting AIME 2025 accuracy from 70% to 87.5%

Sourceful
Riverflow V2 Fast Preview
Sourceful
General
ctx: 128K
Released 2025

Riverflow V2 Fast Preview is a speed-optimised general-purpose language model from Sourceful, designed for applications where response time is critical. As a preview model, it offers a glimpse into Sourceful's approach to efficient AI — balancing quality and throughput for

OpenAI
GPT-OSS 20B
OpenAI
Agentic
OPENctx: 128K
20B paramsReleased 2025

GPT-OSS 20B is OpenAI's first open-source model release — a 20-billion-parameter model made freely available through Hugging Face and compatible APIs. This is a significant moment for OpenAI, marking their first foray into open-weight AI after years of closed-source development.

MoonshotAI
Kimi K2 0711
MoonshotAI
Featured
Creative
OPENctx: 128K
1T / 32B active paramsReleased Jul 2025

Moonshot AI's Kimi K2 (July 11 2025 release) is a landmark 1-trillion-parameter MoE model with only 32 billion active parameters per inference pass, built using the novel MuonClip optimiser for stable large-scale training. Kimi K2 achieves state-of-the-art performance on coding

Cognitive Computations
Venice Uncensored
Cognitive Computations
Creative
OPENctx: 32K
24B paramsReleased 2025

Venice Uncensored (Dolphin Mistral 24B Venice Edition) is Cognitive Computations' 'uncensored' fine-tune of the Mistral 24B model, built for creative writing, roleplay, and use cases where standard safety filtering would prevent valuable outputs. The 'Venice Edition' designation reflects training optimised

Reasoning Models

Models like DeepSeek R1, OLMo 3.1 Think, and TNG R1T2 Chimera use extended chain-of-thought (CoT) reasoning — they "think before answering," showing intermediate steps. Best for maths, logic, and complex problem-solving. Expect slower, longer responses but significantly higher accuracy on hard tasks.

Multimodal Models

Vision-language models (VLMs) like Gemma 3, NVIDIA Nemotron VL, and Qwen2.5-VL can understand images, charts, documents, and (in some cases) video alongside text. Use these when you need to analyse screenshots, diagrams, receipts, or any visual content.

Coding Models

Specialised models like Devstral 2, Qwen3 Coder 480B, and KAT-Coder-Pro are fine-tuned on large programming datasets and evaluated on SWE-bench (real GitHub issue resolution). They excel at agentic coding tasks, multi-file edits, bug detection, and software engineering agent workflows.

Frequently Asked Questions

What is the difference between a reasoning model and a standard LLM?

Reasoning models (marked with a 'THINK' badge on LumiChats) use chain-of-thought (CoT) techniques — they generate internal reasoning traces before producing a final answer. This makes them much more accurate on maths, logic, and multi-step problems, but slower and more token-intensive than standard models.

Which model should I use for coding tasks?

For agentic coding (autonomous bug fixing, multi-file refactoring): Devstral 2 (72.2% SWE-bench) or Qwen3 Coder 480B. For code completion in a chat interface: Llama 3.3 70B, DeepSeek R1 0528, or GLM 4.5 Air. For absolute top quality: Claude Sonnet 4.6 (Premium) or GPT-5.3-Codex (Premium).

Which models can analyse images and documents?

Models tagged 'Multimodal' accept image inputs. Top picks: Gemma 3 27B (best open-source VLM), NVIDIA Nemotron Nano 12B 2 VL (best for OCR/documents), Gemini 2.0 Flash Experimental (free, 1M context), and Qwen2.5-VL 7B (best for Chinese-language docs).

What are free vs premium models on LumiChats?

Free models (marked GREEN) are accessible to all users with no subscription. Premium models (marked with a Crown icon) — GPT-5.2, Claude Sonnet 4.5/4.6, Claude Haiku 4.5, GPT-5.3-Codex, Gemini 2.5, Gemini 3 Flash Preview, Grok 4.1 Fast — require a LumiChats Premium subscription.

What is a Mixture of Experts (MoE) model?

A MoE model has many 'expert' sub-networks but only activates a small subset per token during inference. For example, Qwen3 235B A22B has 235B total parameters but only 22B active — making it cheaper to run than a full 235B dense model while retaining its full representational capacity. Models like DeepSeek R1, Kimi K2, and GLM 4.5 all use MoE.

Which models are best for creative writing?

For creative fiction and storytelling: Kimi K2 0711 (praised for creative quality), Hermes 3 405B (strong roleplaying, uncensored), Venice Uncensored (adult creative content), and Claude Sonnet 4.6 (nuanced, high-quality writing). Gemma 3 27B ranked #2 on the EQ-Bench creative writing leaderboard among open models.

42 models. One platform.

Switch between every model above mid-session. No extra subscriptions for free models. Upgrade to unlock all 42.

Start Free Today