Successor to GLM 5.1. AA Intelligence Index 50.7, GPQA 89.5, Terminal-Bench 50.8 — open-weight reasoning gains over 5.1.
| Benchmark | Score | Rank |
|---|---|---|
TerminalArtificial Analysis Agentic terminal coding tasks requiring multi-step execution | 50.8% | #14 / 50 |
GPQAArtificial Analysis PhD-level science questions even experts struggle with | 89.5% | #17 / 67 |
MMLU-Provals.ai Harder 10-option successor to MMLU; more reasoning-focused | 86.7% | #22 / 41 |
LiveCodeBenchvals.ai Contamination-free competitive programming (filtered by cutoff date) | 69.5% | #36 / 43 |