First 10M token context window, 17B active params from 109B MoE
| Benchmark | Score | Rank |
|---|---|---|
MMMUArtificial Analysis College-level multimodal reasoning across 30+ disciplines | 52.9% | #32 / 33 |
TerminalArtificial Analysis Agentic terminal coding tasks requiring multi-step execution | 1.5% | #37 / 37 |
MATH Competition-level mathematics problems | 50.3% | #44 / 49 |
GPQA PhD-level science questions even experts struggle with | 57.2% | #44 / 54 |
MMLU Tests knowledge across 57 subjects from STEM to humanities | 79.6% | #47 / 53 |