First 10M token context window, 17B active params from 109B MoE
| Benchmark | Score | Rank |
|---|---|---|
MMMUArtificial Analysis College-level multimodal reasoning across 30+ disciplines | 52.9% | #38 / 39 |
MATH Competition-level mathematics problems | 50.3% | #45 / 50 |
MMLU Tests knowledge across 57 subjects from STEM to humanities | 79.6% | #48 / 54 |
TerminalArtificial Analysis Agentic terminal coding tasks requiring multi-step execution | 1.5% | #48 / 48 |
GPQA PhD-level science questions even experts struggle with | 57.2% | #54 / 64 |