First large language model to demonstrate emergent few-shot learning
| Benchmark | Score | Rank |
|---|---|---|
HellaSwag Common sense reasoning about everyday situations | 78.9% | #36 / 36 |
ARC-C Grade-school science questions requiring reasoning | 51.4% | #40 / 40 |
HumanEval Coding ability - generating correct Python functions | 19.1% | #49 / 49 |
MMLU Tests knowledge across 57 subjects from STEM to humanities | 43.9% | #53 / 53 |