First public Mythos-class model — a tier above Opus. Highest AA Intelligence Index ever (64.9), Arena Elo 1932, SWE-bench Pro 80.3. Stripe compressed 2+ months of work into 1 day on a 50M-line codebase.
| Benchmark | Score | Rank |
|---|---|---|
MMLU-Provals.ai Harder 10-option successor to MMLU; more reasoning-focused | 91.5% | #1 / 40 |
LiveCodeBenchvals.ai Contamination-free competitive programming (filtered by cutoff date) | 89.8% | #1 / 42 |
OSWorld Computer use in real desktop environments | 85% | #1 / 10 |
MMMUvals.ai College-level multimodal reasoning across 30+ disciplines | 89.3% | #1 / 41 |
Arena Elo Human preference ranking via blind comparisons | 1932 | #1 / 54 |
GPQAArtificial Analysis PhD-level science questions even experts struggle with | 92.6% | #8 / 66 |