Reasoning-focused flagship, competitive with frontier Western models
| Benchmark | Score | Rank |
|---|---|---|
MMLU-Provals.ai Harder 10-option successor to MMLU; more reasoning-focused | 87% | #13 / 30 |
LiveCodeBenchvals.ai Contamination-free competitive programming (filtered by cutoff date) | 81.8% | #14 / 31 |
GPQAArtificial Analysis PhD-level science questions even experts struggle with | 83% | #25 / 54 |
TerminalArtificial Analysis Agentic terminal coding tasks requiring multi-step execution | 28.8% | #25 / 37 |