AnthropicMarch 4, 2024· Multimodal

Claude 3 Opus

Name: Claude 3 Opus
Author: Anthropic

First Claude to match GPT-4, introduced the Opus/Sonnet/Haiku tier system

BENCHMARKS

Benchmark	Score	Rank
HellaSwag Common sense reasoning about everyday situations	95.4%	#11 / 36
ARC-C Grade-school science questions requiring reasoning	96.4%	#25 / 40
MMLU Tests knowledge across 57 subjects from STEM to humanities	86.8%	#37 / 54
HumanEval Coding ability - generating correct Python functions	84.9%	#38 / 50
MATH Competition-level mathematics problems	60.1%	#41 / 50
Arena Elo Human preference ranking via blind comparisons	1248	#46 / 51
GPQA PhD-level science questions even experts struggle with	59.4%	#61 / 73
hleArtificial Analysis	3.1%	#61 / 61