Top-performing Chinese open model, strong coding + reasoning
| Benchmark | Score | Rank |
|---|---|---|
MMLU-Provals.ai Harder 10-option successor to MMLU; more reasoning-focused | 86.9% | #20 / 38 |
TerminalArtificial Analysis Agentic terminal coding tasks requiring multi-step execution | 43.2% | #20 / 48 |
LiveCodeBenchvals.ai Contamination-free competitive programming (filtered by cutoff date) | 81.4% | #23 / 40 |
GPQAArtificial Analysis PhD-level science questions even experts struggle with | 86.8% | #28 / 64 |