Key Points
- •Model performance improves predictably with more compute, data, and parameters
- •Discovered by OpenAI researchers, refined by Chinchilla (DeepMind)
- •Power-law relationships enable forecasting of future capabilities
- •Suggests path to more capable AI through scaling alone
- •Debate: will scaling continue to yield gains or hit diminishing returns?
The Discovery
In 2020, researchers at OpenAI published a landmark paper showing that the performance of language models follows predictable mathematical relationships with scale. As you increase compute, data, or model parameters, loss decreases according to power laws.
This wasn't obvious. It could have been that returns diminished sharply, or that progress was irregular. Instead, the relationships are smooth and predictable across many orders of magnitude.
The Key Relationships
The scaling laws describe how model performance (measured as loss) relates to:
Compute: More training FLOPs generally yield better models. The relationship follows a power law, meaning each 10x increase in compute yields consistent (though diminishing) improvements.
Parameters: Larger models tend to perform better, though the relationship with compute determines optimal size.
Data: More training data improves performance, with similar power-law relationships.
DeepMind's Chinchilla research refined these findings, showing that optimal performance comes from balancing model size and training data—you shouldn't just make models bigger without proportionally more data.
Why This Matters
Scaling laws transformed AI development in several ways:
Predictability: Labs can estimate future capabilities before training, guiding investment decisions.
Roadmap: If scaling continues working, the path to more capable AI is clear—build bigger clusters, train larger models.
Emergence: Larger models exhibit capabilities that smaller versions don't—suggesting that crossing certain scale thresholds may unlock qualitatively new abilities.
Investment thesis: The apparent reliability of scaling laws has driven billions in AI infrastructure investment.
The Second Axis: Inference-Time Scaling
In 2024-2025, a major new scaling axis emerged: test-time compute, or inference-time scaling. Models like OpenAI's o1/o3 and Claude's extended thinking demonstrated that spending more compute at inference—letting the model "think longer"—dramatically improves performance on complex reasoning tasks.
This means scaling isn't just about training bigger models on more data. You can also scale intelligence by giving models more time to reason through problems. This second axis has substantially changed the scaling picture, opening new frontiers of capability improvement even if pre-training scaling alone were to slow.
The Current State
Scaling has proven remarkably durable across multiple generations of frontier models—from GPT-2 to GPT-4 to Claude Opus to Gemini Ultra and beyond, spanning 8+ orders of magnitude. Combined with inference-time scaling, the toolkit for improving AI capabilities continues to expand.
The remaining questions are about rate and efficiency, not direction. Data scarcity is being addressed through synthetic data generation and improved data quality. Algorithmic breakthroughs continue to deliver more capability per compute dollar.
Beyond Language Models
Scaling laws have been observed in many domains:
- Vision models
- Multi-modal systems
- Reinforcement learning
- Code generation
- Scientific applications
This generality suggests something fundamental about how neural networks learn, not just a quirk of language modeling.
Implications for AGI
If scaling continues working, AGI may be primarily an engineering and investment challenge rather than requiring fundamental breakthroughs. This is both encouraging (the path exists) and concerning (it may arrive faster than alignment research can prepare for).

