Top-performing Chinese open model, strong coding + reasoning
| Benchmark | Score | Rank |
|---|---|---|
TerminalArtificial Analysis Agentic terminal coding tasks requiring multi-step execution | 43.2% | #13 / 37 |
MMLU-Provals.ai Harder 10-option successor to MMLU; more reasoning-focused | 86.9% | #14 / 30 |
LiveCodeBenchvals.ai Contamination-free competitive programming (filtered by cutoff date) | 81.4% | #15 / 31 |
GPQAArtificial Analysis PhD-level science questions even experts struggle with | 86.8% | #19 / 54 |