"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis

The RL Fine-Tuning Playbook: CoreWeave's Kyle Corbitt on GRPO, Rubrics, Environments, Reward Hacking

May 1, 2026·1h 46m

Episode Description from the Publisher

Kyle Corbitt, founder of OpenPipe, breaks down reinforcement learning and custom fine-tuning for modern AI models. He explains how RL differs from supervised fine-tuning, why GRPO and LLM-as-judge post-training matter, and how these techniques can improve performance, latency, and cost on open source models. The conversation also covers reward hacking, evaluation design, LoRA adapters, and how Chinese labs are using distillation to fast-follow frontier models. Sponsors: Sequence: Sequence handles the full revenue workflow for complex pricing, from quoting and metering to invoicing, revenue recognition, and collections. Book a public demo at https://sequencehq.com and use code Cognizant in the source field to save 20% off year one AvePoint: AvePoint is building the control layer for AI agents so you can securely govern, audit, and recover every action at scale. Design trusted agentic outcomes from day one at https://avpt.co/tcr VCX: VCX, by Fundrise, is the public ticker for private tech, giving everyday investors access to high-growth private companies in AI, space, defense tech, and more. Learn how to invest at https://getvcx.com Claude: Claude by Anthropic is an AI collaborator that understands your workflow and helps you tackle research, writing, coding, and organization with deep context. Get started with Claude and explore Claude Pro at https://claude.ai/tcr

AI Summary coming soon

Get Free Summaries →

Free forever for up to 3 podcasts. No credit card required.

Listen to This Episode

Apple Podcasts

More from "The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis

AI in the AM: 99% off search, GPT-5.5 is "clean", model welfare analysis, & efficient analog compute

April 26, 2026·2h 38m

Does Learning Require Feeling? Cameron Berg on the latest AI Consciousness & Welfare Research

April 23, 2026·3h 33m

Vibe-Coding an Attention Firewall, w/ Steve Newman, creator of The Curve

April 19, 2026·2h 9m

Welcome to AI in the AM: RL for EE, Oversight w/out Nationalization, & the first AI-Run Retail Store

April 15, 2026·2h 30m

View all episodes →

Get summaries like this every morning.

Free AI-powered recaps of "The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis and your other favorite podcasts, delivered to your inbox.

Get Free Summaries →

Free forever for up to 3 podcasts. No credit card required.