
The Claude Fable 5/Mythos 5 System Card has a section in which they talk about illegible reasoning, and provide an "extreme" example thereof. Models developing their own uninterpretable, unmonitorable internal language has been a major theoretical concern for a while, and when o3 was released last year with its disclaim overshadow disclaim vantage style word salad CoT, it seemed like the problem had become real and immediate. And yet, since o3, other model families have not appeared to hav...
Podzilla Summary coming soon
Sign up to get notified when the full AI-powered summary is ready.
Free forever for up to 3 podcasts. No credit card required.

"Sympathy for both sides of the egregious misalignment debate" by Steven Byrnes

"PSA: Almost nobody is working on alignment" by Chi Nguyen, peterbarnett

"Estimating No-CoT Task-Completion Time Horizons of Frontier AI Models" by Anders Cairns Woodruff, Francis Rhys Ward, Dewi Gould, Rauno Arike, Jason R Brown, Jo Jiao, wlanderson, ariana_azarbal, harrymayne, Patrick Leask

"Sequent: scale and automation for higher confidence in alignment" by Geoffrey Irving, Alex HT, Jesse Hoogland, Daniel Murfet, Jacob Pfau, Marco Cozzi, Stan van Wingerden
Free AI-powered recaps of LessWrong (Curated & Popular) and your other favorite podcasts, delivered to your inbox.
Free forever for up to 3 podcasts. No credit card required.