Trained on Colossus, xAI's 100K GPU cluster, first competitive Grok
| Benchmark | Score | Rank |
|---|---|---|
ARC-C Grade-school science questions requiring reasoning | 96.4% | #26 / 40 |
HumanEval Coding ability - generating correct Python functions | 88.4% | #35 / 50 |
MMMUvals.ai College-level multimodal reasoning across 30+ disciplines | 57.3% | #35 / 39 |
MMLU Tests knowledge across 57 subjects from STEM to humanities | 87.5% | #36 / 54 |
MATH Competition-level mathematics problems | 76.1% | #37 / 50 |
Arena Elo Human preference ranking via blind comparisons | 1256 | #46 / 52 |
GPQA PhD-level science questions even experts struggle with | 56% | #55 / 64 |