The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

How to Engineer AI Inference Systems with Philip Kiely - #766

April 30, 2026·54 min

Episode Description from the Publisher

In this episode, Philip Kiely, head of AI education at Baseten, joins us to unpack the fast-evolving discipline of inference engineering. We explore why inference has become the stickiest and most critical workload in AI, how it blends GPU programming, applied research, and large-scale distributed systems, and where the line sits between inference and model serving. Philip shares how research-to-production can move in hours, not months, and why understanding “the knobs” of inference—batching, quantization, speculation, and KV cache reuse—lets teams design better products and SLAs. We trace the inference maturity journey from closed APIs to dedicated deployments and in-house platforms, discuss GPU lifecycles, and survey today’s runtime landscape, including vLLM, SGLang, and TensorRT LLM. Finally, we look ahead to agents and multimodality, making the case for specialized, workload-specific runtimes when performance and efficiency matter most. The complete show notes for this episode can be found at https://twimlai.com/go/766.

Podzilla Summary coming soon

Get Free Summaries →

Free forever for up to 3 podcasts. No credit card required.

Listen to This Episode

Apple Podcasts

More from The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Is RAG Dead? Lessons from Building AI for Tax Law with Alex Bowcut - #769

June 9, 2026·51 min

Relational Foundation Models for Enterprise Data with Jure Leskovec - #768

May 21, 2026·1h 6m

How to Find the Agent Failures Your Evals Miss with Scott Clark - #767

May 7, 2026·53 min

How Capital One Delivers Multi-Agent Systems with Rashmi Shetty - #765

April 16, 2026·54 min

View all episodes →

Get summaries like this every morning.

Free AI-powered recaps of The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) and your other favorite podcasts, delivered to your inbox.

Get Free Summaries →

Free forever for up to 3 podcasts. No credit card required.