Frontier-tier Qwen flagship. AA Intelligence Index 56.6, GPQA 92.3, Terminal-Bench 50.8 — closes the gap to Western frontier at lower cost.
| Benchmark | Score | Rank |
|---|---|---|
MMLU-Provals.ai Harder 10-option successor to MMLU; more reasoning-focused | 89.3% | #6 / 38 |
Arena EloArtificial Analysis Human preference ranking via blind comparisons | 1545 | #6 / 52 |
GPQAArtificial Analysis PhD-level science questions even experts struggle with | 92.3% | #7 / 64 |
LiveCodeBenchvals.ai Contamination-free competitive programming (filtered by cutoff date) | 87.1% | #7 / 40 |
TerminalArtificial Analysis Agentic terminal coding tasks requiring multi-step execution | 50.8% | #13 / 48 |