The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

The Race to Production-Grade Diffusion LLMs with Stefano Ermon - #764

March 26, 2026·1h 3m

Episode Description from the Publisher

Today, we're joined by Stefano Ermon, associate professor at Stanford University and CEO of Inception Labs to discuss diffusion language models. We dig into how diffusion approaches—traditionally used for images—are being adapted for text and code generation, the technical challenges of applying continuous methods to discrete token spaces, and how diffusion models compare to traditional autoregressive LLMs. Stefano introduces Mercury 2, a commercial-scale diffusion LLM that can generate multiple tokens simultaneously and achieve inference speeds 5-10x faster than small frontier models, paving the way for latency-sensitive applications like voice interactions and fast agentic loops. We also cover the open research challenges in diffusion LLM training, serving infrastructure requirements, and post-training for diffusion-based systems. Finally, Stefano shares his perspective on whether diffusion models can rival or surpass autoregressive LLMs at scale, the advantages for highly controllable generation, and what the future of multimodal diffusion models might look like. The complete show notes for this episode can be found at https://twimlai.com/go/764.

AI Summary coming soon

Get Free Summaries →

Free forever for up to 3 podcasts. No credit card required.

Listen to This Episode

Apple Podcasts

More from The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

How to Engineer AI Inference Systems with Philip Kiely - #766

April 30, 2026·54 min

How Capital One Delivers Multi-Agent Systems with Rashmi Shetty - #765

April 16, 2026·54 min

Agent Swarms and Knowledge Graphs for Autonomous Software Development with Siddhant Pardeshi - #763

March 10, 2026·1h 16m

AI Trends 2026: OpenClaw Agents, Reasoning LLMs, and More with Sebastian Raschka - #762

February 26, 2026·1h 18m

View all episodes →

Get summaries like this every morning.

Free AI-powered recaps of The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) and your other favorite podcasts, delivered to your inbox.

Get Free Summaries →

Free forever for up to 3 podcasts. No credit card required.