Timeline of frontier AI model releases

AI MODEL RELEASES

Every frontier model from 2020 to present

Chronological timeline of 67 frontier AI models from 12 providers. Click any model to see full benchmark scores, API pricing, and capabilities.

67

MODELS

12

PROVIDERS

18

OPEN SOURCE

DeepSeek V4 Pro

LATEST

67 OF 67 MODELS

JUST RELEASED

Preliminary Data

These models released within the last few days. Third-party benchmark coverage is incomplete, so the Singularity Index is withheld until AA, vals.ai, and ARC Prize finish their eval passes.

GPT-5.5 Pro

OpenAI·Apr 23, 2026

Top-tier GPT-5.5 variant. 98% on ARC-AGI-1, 90.4% on ARC-AGI-2. Awaiting full third-party eval coverage.

MiMo-V2.5-Pro

Xiaomi·Apr 22, 2026·1T-A42B (MoE)

Open-weight 1T MoE (42B active) matching frontier intelligence at half the cost. Strong agentic and reasoning, awaiting coding eval coverage.

2026

17 MODELS

DeepSeek V4 Pro

DeepSeek·Apr 24, 2026

Flagship successor to V3 (Max Effort reasoning). Open-weight, competitive with closed frontier at a fraction of the cost. Awaiting Multimodal eval coverage.

GPT-5.5

OpenAI·Apr 23, 2026

Frontier reasoning model. #1 on AA Intelligence Index (60.24), tops Arena Elo at 1781, crosses 90% on ARC-AGI-2, and leads Terminal-Bench 2.0 at 82.7%.

Kimi K2.6

Moonshot·Apr 20, 2026

Successor to K2.5 with stronger reasoning and Arena Elo; trades some agentic score for benchmark gains

Qwen 3.6 Max Preview

Alibaba·Apr 20, 2026

Top-tier Qwen variant above Plus, highest Arena Elo in the family at 1511

Claude Opus 4.7

Anthropic·Apr 16, 2026

1M context, stronger coding and vision. SWE-bench Pro 64.3% beats GPT-5.4 and Gemini 3.1 Pro.

Muse Spark

Meta·Apr 8, 2026

Meta's first proprietary model from Superintelligence Labs, non-open-weight

Qwen 3.6-Plus

Alibaba·Apr 2, 2026

Agentic coding focus, 1M context, strong multimodal performance

GPT-5.4

OpenAI·Mar 5, 2026

First mainline model incorporating GPT-5.3-Codex coding capabilities. Native computer use, 1M context, surpasses human baseline on OSWorld. Codex branch ends here.

Gemini 3.1 Pro

Google·Feb 19, 2026

Highest GPQA Diamond score ever at 94.3%, doubled ARC-AGI-2 to 77.1%

Claude Sonnet 4.6

Anthropic·Feb 17, 2026

Opus-level performance at one-fifth the cost, default model on claude.ai

Grok 4.20

xAI·Feb 17, 2026

Four specialized agents debate before answering, 65% hallucination reduction

Qwen 3.5

Alibaba·Feb 16, 2026

Open-weight 397B MoE with visual agentic capabilities, 201 languages

GPT-5.3 Codex

OpenAI·Feb 5, 2026

Final specialized Codex release before coding was folded into mainline GPT-5.4. Peak on LiveCodeBench and Terminal-Bench.

Claude Opus 4.6

Anthropic·Feb 5, 2026

1M context with parallel agent teams, #1 on Arena Elo at 1504

GLM 5.1 Thinking

Zhipu AI·Feb 1, 2026

Top-performing Chinese open model, strong coding + reasoning

Kimi K2.5

Moonshot·Jan 27, 2026

1T MoE with 100-agent swarm coordination, 99% HumanEval

MiniMax M2.1

MiniMax·Jan 15, 2026

Reasoning-focused flagship, competitive with frontier Western models

2025

27 MODELS

Gemini 3 Flash

Google·Dec 17, 2025

Flash-tier model outperforming previous-gen Pro on most benchmarks

GPT-5.2

OpenAI·Dec 11, 2025

First model to score 100% on AIME 2025 and 80% on SWE-bench

Devstral 2

Mistral·Dec 9, 2025·123B

Open-weight coding model, 72.2% SWE-bench Verified at 7x lower cost than Sonnet

Mistral Large 3

Mistral·Dec 2, 2025

Major leap for Mistral, closed the gap with US frontier labs

DeepSeek V3.2

DeepSeek·Dec 1, 2025

Matched GPT-5 class at a fraction of the cost, open-weight MoE

Claude Opus 4.5

Anthropic·Nov 24, 2025

First model to break 80% on SWE-bench Verified

Gemini 3 Pro

Google·Nov 18, 2025

First model to break 1500 Arena Elo, 100% on AIME with code execution

Grok 4.1

xAI·Nov 17, 2025

#1 on Arena thinking mode, 1M context, strong agentic coding

GPT-5.1

OpenAI·Nov 12, 2025

Incremental upgrade with improved reliability and instruction following

Claude Haiku 4.5

Anthropic·Oct 15, 2025

Haiku-tier model matching Sonnet 4 on coding at one-third the cost

Claude Sonnet 4.5

Anthropic·Sep 29, 2025

First Sonnet to score 100% on AIME, closed the Opus gap entirely

GPT-5

OpenAI·Aug 7, 2025

Unified reasoning and chat, 400K context, 94.6% on AIME 2025

Claude Opus 4.1

Anthropic·Aug 5, 2025

Improved agentic reliability with better tool use and planning

Kimi K2

Moonshot·Jul 11, 2025

Open-weight MoE that matched frontier closed models on coding

Grok 4

xAI·Jul 9, 2025

Deep reasoning model, strongest on abstract math at launch

Gemini 2.5 Flash

Google·Jun 17, 2025

First Flash model with thinking, near-Pro performance at one-eighth the price

Claude Opus 4

Anthropic·May 22, 2025

Strongest agentic model at launch, sustained multi-hour autonomous coding

Claude Sonnet 4

Anthropic·May 22, 2025

Matched Opus 4 on coding at one-fifth the price, 1M context window

Qwen 3 235B

Alibaba·Apr 28, 2025·235B-A22B (MoE)

Hybrid reasoning MoE trained on 36T tokens across 119 languages

o3

OpenAI·Apr 16, 2025

Full reasoning model with tool use, first to break 80% on GPQA Diamond

Llama 4 Scout

Meta·Apr 5, 2025·109B-A17B (MoE)

First 10M token context window, 17B active params from 109B MoE

Llama 4 Maverick

Meta·Apr 5, 2025

First open MoE from Meta, natively multimodal with 1M context

Gemini 2.5 Pro

Google·Mar 25, 2025

Built-in thinking mode, native audio and video understanding

Claude 3.7 Sonnet

Anthropic·Feb 24, 2025

First hybrid reasoning model, extended thinking mode for complex problems

Grok 3

xAI·Feb 17, 2025

Matched frontier labs on reasoning, trained on 200K H100 cluster

o3-mini

OpenAI·Jan 31, 2025

Reasoning at 75% lower cost than o1, made chain-of-thought economically viable

DeepSeek R1

DeepSeek·Jan 20, 2025

Open-weight reasoning model that triggered the DeepSeek market shock

2024

12 MODELS

DeepSeek V3

DeepSeek·Dec 26, 2024

Trained for $5.5M, proved frontier performance was possible at low cost

o1

OpenAI·Dec 17, 2024

First reasoning model, uses chain-of-thought at inference time to solve hard problems

Gemini 2.0 Flash

Google·Dec 11, 2024

Near-frontier performance at flash pricing, native tool use and code execution

Llama 3.3 70B

Meta·Dec 6, 2024·70B

405B-class performance distilled into 70B parameters

Claude 3.5 Sonnet

Anthropic·Oct 22, 2024

Best coding model of 2024, dominated SWE-bench for months

Grok 2

xAI·Aug 13, 2024

Trained on Colossus, xAI's 100K GPU cluster, first competitive Grok

Mistral Large 2

Mistral·Jul 24, 2024

European frontier model with strong multilingual and code performance

Llama 3.1 405B

Meta·Jul 23, 2024·405B

Largest open-weight model at 405B parameters, GPT-4 class performance

GPT-4o

OpenAI·May 13, 2024

Natively multimodal with voice, 2x faster and 50% cheaper than GPT-4 Turbo

Llama 3 70B

Meta·Apr 18, 2024·70B

Open-weight model competitive with GPT-4 class, massive fine-tuning ecosystem

Claude 3 Opus

Anthropic·Mar 4, 2024

First Claude to match GPT-4, introduced the Opus/Sonnet/Haiku tier system

Gemini 1.5 Pro

Google·Feb 15, 2024

First 1M token context window, processed entire codebases in one pass

2023

5 MODELS

Gemini 1.0 Pro

Google·Dec 6, 2023

Google's first natively multimodal model, launched the Gemini brand

Llama 2 70B

Meta·Jul 18, 2023·70B

Open-weight model that kickstarted the open-source LLM ecosystem

Claude 2

Anthropic·Jul 11, 2023

First Claude with 100K context, established Anthropic as a frontier lab

PaLM 2

Google·May 10, 2023

Powered Bard and Google Workspace AI, strong multilingual performance

GPT-4

OpenAI·Mar 14, 2023

First multimodal GPT, passed the bar exam, defined the frontier for a year

2022

3 MODELS

GPT-3.5 Turbo

OpenAI·Nov 30, 2022

Launched ChatGPT, fastest consumer product to 100M users in history

PaLM 540B

Google·Apr 4, 2022·540B

Largest dense model at launch, first to show chain-of-thought reasoning at scale

Chinchilla 70B

Google·Mar 29, 2022·70B

Proved most LLMs were undertrained, reshaped scaling strategy industry-wide

2020

1 MODEL

GPT-3 175B

OpenAI·Jun 11, 2020·175B

First large language model to demonstrate emergent few-shot learning

◇

TRACKING INTELLIGENCE

67 MODELS CATALOGED

◇