Two-version jump from M2.1 (skipping intermediate M2.5). AA Intelligence Index 49.6 — top-tier Chinese open-weight reasoning at half the cost of Western frontier.
| Benchmark | Score | Rank |
|---|---|---|
Arena EloArtificial Analysis Human preference ranking via blind comparisons | 1507 | #9 / 52 |
TerminalArtificial Analysis Agentic terminal coding tasks requiring multi-step execution | 39.4% | #24 / 48 |
LiveCodeBenchvals.ai Contamination-free competitive programming (filtered by cutoff date) | 79.9% | #25 / 40 |
GPQAArtificial Analysis PhD-level science questions even experts struggle with | 87.4% | #26 / 64 |
MMLU-Provals.ai Harder 10-option successor to MMLU; more reasoning-focused | 80.4% | #32 / 38 |