Claim: Diffusion models will replace VAEs

Source

  • Paper X: “The Future of Generative Models”
  • Discussions at NeurIPS 2024

Evidence For

  • Diffusion models achieve higher sample quality on image benchmarks
  • Scalability to high resolutions with fewer artifacts
  • Rapid adoption in industry (Stable Diffusion, DALL-E, Midjourney)
  • Training stability advantages over GANs

Evidence Against

  • VAEs provide a principled latent space and likelihood-based training
  • VAEs enable efficient encoding/decoding (useful for compression, representation learning)
  • Hybrid approaches (e.g., latent diffusion) already borrow VAE-like components
  • Diffusion is slow at inference time; distillation techniques are still maturing

Assumptions

  • That compute will remain cheap enough for iterative denoising at scale
  • That latent space structure is less important than sample quality for most applications
  • That no new architecture will emerge that combines the best of both

Counterarguments

  • “Replace” is too strong — more likely a complementary toolkit where each excels at different tasks
  • Diffusion models may themselves evolve into something VAE-like (flow matching, consistency models)
  • The history of ML shows that no single architecture dominates forever

My Current Confidence

65%

Confidence Log

DateConfidenceReason for Change
2025-01-0165%Initial assessment — strong evidence but “replace” is a high bar