Claim: Diffusion models will replace VAEs

Source

Paper X: “The Future of Generative Models”
Discussions at NeurIPS 2024

Evidence For

Diffusion models achieve higher sample quality on image benchmarks
Scalability to high resolutions with fewer artifacts
Rapid adoption in industry (Stable Diffusion, DALL-E, Midjourney)
Training stability advantages over GANs

Evidence Against

VAEs provide a principled latent space and likelihood-based training
VAEs enable efficient encoding/decoding (useful for compression, representation learning)
Hybrid approaches (e.g., latent diffusion) already borrow VAE-like components
Diffusion is slow at inference time; distillation techniques are still maturing

Assumptions

That compute will remain cheap enough for iterative denoising at scale
That latent space structure is less important than sample quality for most applications
That no new architecture will emerge that combines the best of both

Counterarguments

“Replace” is too strong — more likely a complementary toolkit where each excels at different tasks
Diffusion models may themselves evolve into something VAE-like (flow matching, consistency models)
The history of ML shows that no single architecture dominates forever

My Current Confidence

65%

Confidence Log

Date	Confidence	Reason for Change
2025-01-01	65%	Initial assessment — strong evidence but “replace” is a high bar