Why Tejal Patwardhan stopped underestimating the models - Episode 21

June 16, 2026·44 min

Episode Description from the Publisher

The old tests are getting too easy. Tejal Patwardhan leads OpenAI’s frontier evals team, which is finding new ways to measure and forecast progress as models become more capable. She and host Andrew Mayne discuss why evals matter for research, how benchmarks can break or get gamed, and what models need to be judged on next.Chapters00:00:24 Growing up at OpenAI00:03:10 Why reasoning changed everything00:06:28 What made o1 surprising00:11:20 Why old benchmarks stopped working00:14:45 What makes a good benchmark00:17:35 Why evals are getting harder00:22:09 Measuring voice and vision models00:24:48 Testing models on real science00:33:23 How OpenAI tracks frontier progress00:40:47 What AI means for work Hosted on Acast. See acast.com/privacy for more information.

Podzilla Summary coming soon

Get Free Summaries →

Free forever for up to 3 podcasts. No credit card required.