"Even “illegible” Mythos reasoning traces seem pretty legible" by faul_sname

June 11, 2026·7 min

Episode Description from the Publisher

The Claude Fable 5/Mythos 5 System Card has a section in which they talk about illegible reasoning, and provide an "extreme" example thereof. Models developing their own uninterpretable, unmonitorable internal language has been a major theoretical concern for a while, and when o3 was released last year with its disclaim overshadow disclaim vantage style word salad CoT, it seemed like the problem had become real and immediate. And yet, since o3, other model families have not appeared to hav...

Podzilla Summary coming soon

Get Free Summaries →

Free forever for up to 3 podcasts. No credit card required.

Listen to This Episode

Apple Podcasts

More from LessWrong (Curated & Popular)

"Sympathy for both sides of the egregious misalignment debate" by Steven Byrnes

June 13, 2026·8 min

"PSA: Almost nobody is working on alignment" by Chi Nguyen, peterbarnett

June 12, 2026·1 min

"Estimating No-CoT Task-Completion Time Horizons of Frontier AI Models" by Anders Cairns Woodruff, Francis Rhys Ward, Dewi Gould, Rauno Arike, Jason R Brown, Jo Jiao, wlanderson, ariana_azarbal, harrymayne, Patrick Leask

June 11, 2026·10 min

"Sequent: scale and automation for higher confidence in alignment" by Geoffrey Irving, Alex HT, Jesse Hoogland, Daniel Murfet, Jacob Pfau, Marco Cozzi, Stan van Wingerden

June 10, 2026·23 min

View all episodes →

Get summaries like this every morning.

Free AI-powered recaps of LessWrong (Curated & Popular) and your other favorite podcasts, delivered to your inbox.

Get Free Summaries →

Free forever for up to 3 podcasts. No credit card required.