
The old tests are getting too easy. Tejal Patwardhan leads OpenAI’s frontier evals team, which is finding new ways to measure and forecast progress as models become more capable. She and host Andrew Mayne discuss why evals matter for research, how benchmarks can break or get gamed, and what models need to be judged on next.Chapters00:00:24 Growing up at OpenAI00:03:10 Why reasoning changed everything00:06:28 What made o1 surprising00:11:20 Why old benchmarks stopped working00:14:45 What makes a good benchmark00:17:35 Why evals are getting harder00:22:09 Measuring voice and vision models00:24:48 Testing models on real science00:33:23 How OpenAI tracks frontier progress00:40:47 What AI means for work Hosted on Acast. See acast.com/privacy for more information.
Podzilla Summary coming soon
Sign up to get notified when the full AI-powered summary is ready.
Free forever for up to 3 podcasts. No credit card required.

How a reasoning model cracked an 80-year-old math problem - Episode 20

Episode 19 - Inside image generation’s Renaissance moment

Episode 18 - Why AI needs a new kind of supercomputer network

Episode 17 - What happens now that AI is good at math?
Free AI-powered recaps of OpenAI Podcast and your other favorite podcasts, delivered to your inbox.
Free forever for up to 3 podcasts. No credit card required.