SuperGrok Heavy beta with native video understanding. AA now tracking with full per-benchmark coverage; #2 globally on initial AA pass.
| Benchmark | Score | Rank |
|---|---|---|
MMMUvals.ai College-level multimodal reasoning across 30+ disciplines | 83.1% | #12 / 39 |
Arena EloArtificial Analysis Human preference ranking via blind comparisons | 1498 | #12 / 52 |
GPQAArtificial Analysis PhD-level science questions even experts struggle with | 90.1% | #14 / 64 |
LiveCodeBenchvals.ai Contamination-free competitive programming (filtered by cutoff date) | 84.5% | #18 / 40 |
MMLU-Provals.ai Harder 10-option successor to MMLU; more reasoning-focused | 85.8% | #22 / 38 |
TerminalArtificial Analysis Agentic terminal coding tasks requiring multi-step execution | 37.9% | #29 / 48 |