
Summary In this episode of the AI Engineering Podcast, Carter Huffman, co-founder and CTO of Modulate, discusses the engineering behind low-latency, high-accuracy Voice AI. He explains why voice is a uniquely challenging modality due to its rich non-textual signals like tone, emotion, and context, and how simple speech-to-text-to-speech pipelines can't capture the necessary nuance. Carter introduces Modulate's Ensemble Listening Model (ELM) architecture, which uses dynamic routing and cost-based optimization to achieve scalability and precision in various audio environments. He covera topics such as reliability under distributed systems constraints, watchdogging with periodic model checks, structured long-horizon memory for conversations, and the trade-offs that make ensemble approaches compelling for repeated tasks at scale. Carter also shares insights on how ELMs generalize beyond voice, draws parallels to database query planners and mixture-of-experts, and discusses strategies for observability and evaluation in complex processing pipelines. Announcements Hello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsUnlock the full potential of your AI workloads with a seamless and composable data infrastructure. Bruin is an open source framework that streamlines integration from the command line, allowing you to focus on what matters most - building intelligent systems. Write Python code for your business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. With native support for ML/AI workloads, Bruin empowers data teams to deliver faster, more reliable, and scalable AI solutions. Harness Bruin's connectors for hundreds of platforms, including popular machine learning frameworks like TensorFlow and PyTorch. Build end-to-end AI workflows that integrate seamlessly with your existing tech stack. Join the ranks of forward-thinking organizations that are revolutionizing their data engineering with Bruin. Get started today at aiengineeringpodcast.com/bruin, and for dbt Cloud customers, enjoy a $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Carter Huffman about his work building an ensemble approach to low latency voice AIInterview IntroductionHow did you get involved in machine learning?Can you describe the "Ensemble Listening" approach and the story behind why Modulate moved away from monolithic architectures?When designing a real-time voice system, how do you handle the routing logic between specialized models without blowing your latency budget?What does the "gatekeeper" or routing layer actually look like in code?You’ve mentioned "evals that don’t lie." How do you build a validation pipeline for noisy, adversarial voice data that catches regressions that a simple word-error-rate (WER) might miss?In an ensemble of models, a failure in one specialized node might not crash the system, but it can degrade the output quality. How do you monitor for these "silent failures" in real-time without introducing massive overhead?For many teams, the default is to call an API for a frontier model. At what point in the scaling or latency curve does it become technically (or economically) necessary to swap a general LLM for a suite of specialized, smaller models?How do you track the real-world costs associated with the technical and human overhead of this more complex system?What are the most interesting, innovative, or unexpected ways that you have seen orchestrated ensembles used in live conversation environments?What are the most interesting, unexpected, or challenging lessons that you have learned while managing the lifecycle of multiple specialized models simultaneously?When is an ensemble approach the wrong choice? (e.g., At what level of complexity or throughput is the overhead of orchestration more trouble than it’s worth?)What do you have planned for the future of Ensemble Listening Models?Are we looking at self-optimizing routers, or perhaps moving these ensembles closer to the edge?Contact Info LinkedInParting Question From your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the late
Podzilla Summary coming soon
Sign up to get notified when the full AI-powered summary is ready.
Free forever for up to 3 podcasts. No credit card required.

Kubernetes, Compliance, and Control: The Operational Backbone of AI Sovereignty

From Blind Spots to Observability: Operationalizing LLM Apps with OpenLit

GPU Clouds, Aggregators, and the New Economics of AI Compute

The Future of Dev Experience: Spotify’s Playbook for Organization‑Scale AI
Free AI-powered recaps of AI Engineering Podcast and your other favorite podcasts, delivered to your inbox.
Free forever for up to 3 podcasts. No credit card required.