Trained on Colossus, xAI's 100K GPU cluster, first competitive Grok
| Benchmark | Score | Rank |
|---|---|---|
ARC-C Grade-school science questions requiring reasoning | 96.4% | #26 / 40 |
MMMUvals.ai College-level multimodal reasoning across 30+ disciplines | 57.3% | #29 / 33 |
HumanEval Coding ability - generating correct Python functions | 88.4% | #34 / 49 |
MMLU Tests knowledge across 57 subjects from STEM to humanities | 87.5% | #35 / 53 |
MATH Competition-level mathematics problems | 76.1% | #36 / 49 |
Arena Elo Human preference ranking via blind comparisons | 1256 | #36 / 41 |
GPQA PhD-level science questions even experts struggle with | 56% | #45 / 54 |