
This episode is brought to you by the MLflow team. Check out more information at MLflow.org.What does it actually take to build voice AI at a billion-interaction scale? This episode features an ex-Amazon voice AI engineer who built customer support systems handling 2 billion+ interactions — now working on next-gen voice agent platforms. Anurag digs deep into the real engineering tradeoffs, design patterns, and use cases that separate production-grade voice agents from demos.Voice Agent Use Cases // MLOps Podcast #374 with Anurag Beniwal, Member of the Technical Staff at ElevenLabs🎙️ Topics covered:🔹 Cascaded vs. speech-to-speech — Why cascaded systems still win in production, and how to make them feel natural without sacrificing control🔹 Latency masking — Foreground/background model architecture and how to buy yourself time while deep retrieval runs🔹 Constellation of models — Using Haiku for tool calling, fine-tuned smaller models for response generation, and why "one model for everything" breaks at scale🔹 Turn-taking & ASR challenges — Why voice is harder than chat: accents, noise, silence detection, and domain-specific fine-tuning🔹 Level 1 vs Level 2 customer support — Why today's agents max out at Level 1 and what it takes to capture Level 2 expert judgment🔹 Inbound vs. outbound sales agents — Where voice agents are already winning, and why inbound lead qualification beats cold outbound🔹 Booking, reservations & concierge — The clearest near-term wins for voice agents across hospitality, home services, and SMBs🔹 Continual learning from natural language feedback — How to build agents that improve from real operator feedback without ML expertise🔹 Conversational TTS — Why passing full conversation history to your TTS model changes everything for tone consistency🔹 User tiers for voice platforms — Non-technical business owners vs. developers vs. enterprise: why one interface doesn't fit all. If you're building production voice agents, evaluating voice AI vendors, or scaling AI-first customer support — this episode is packed with hard-won lessons from someone who's done it at Amazon scale.🔗 Links & Resources:MLOps.community: https://mlops.communityGoogle Scholar: https://scholar.google.com/citations?user=g_QB5WgAAAAJ&hl=en&oAmazon science page: https://www.amazon.science/author/anurag-beniwalJoin the Community: https://go.mlops.community/YTJoinInGet the newsletter: https://go.mlops.community/YTNewsletterMLOps GPU Guide: https://go.mlops.community/gpuguide⏱️ Timestamps [00:00] Cascaded Systems Control Challenge[05:35] Voice vs Chat Complexity[14:16] MLflow's open source platform[15:03] AI Model Constellations[23:00] Model Constellations Use Cases[31:40] Voice vs Text Context[33:54] Voice as Thought Capture[42:11] Cascaded vs Speech-to-Speech Debate[50:02] Wrap up
Podzilla Summary coming soon
Sign up to get notified when the full AI-powered summary is ready.
Free forever for up to 3 podcasts. No credit card required.

Agents are Just While Loops

The Latency Goldilocks Zone Explained

Building MCP Before MCP Existed: Inside Despegar's Sofia Agent

The Creator of Superpowers: Why Real Agentic Engineering Beats Vibe Coding
Free AI-powered recaps of MLOps.community and your other favorite podcasts, delivered to your inbox.
Free forever for up to 3 podcasts. No credit card required.