Claim: Scaling RL (not just pretraining) will be the primary driver of AI capability gains by 2030

Source

Scaling laws papers (Kaplan et al., Hoffmann et al.)
OpenAI o1/o3 system cards
DeepSeek-R1 paper
Discussions in AI alignment and capabilities research communities

Evidence For

OpenAI’s o-series models show substantial gains from RL-based reasoning
DeepSeek-R1 demonstrates that pure RL can induce reasoning behaviors
Pretraining data may hit a wall (finite internet data, legal constraints)
RL enables self-play and synthetic data generation, breaking the data bottleneck
RL directly optimizes for outcomes rather than next-token prediction

Evidence Against

Pretraining is the foundation; RL only works on top of strong pretrained models
RL is notoriously unstable and hard to scale reliably
Current RL techniques (RLHF, GRPO) are still brittle and reward-hackable
Gains from RL may saturate faster than pretraining gains
The most impressive capabilities still correlate strongly with pretraining compute

Assumptions

That pretraining data scarcity will become a binding constraint
That RL stability issues can be solved at scale
That reward specification for general capabilities is tractable
That no fundamentally new paradigm emerges (e.g., active inference, new architectures)

Counterarguments

RL and pretraining are complementary, not competing — framing it as “primary driver” may be a false dichotomy
The real driver might be architecture innovations, not training methodology
Synthetic data from RL may reduce in quality over multiple generations (model collapse)

My Current Confidence

55%

Confidence Log

Date	Confidence	Reason for Change
2025-01-01	55%	Initial — RL is clearly important, but “primary driver” is a strong claim