The Co-opting of Safety

August 21, 2025·1h 24m

Episode Description from the Publisher

We dig into how the concept of AI "safety" has been co-opted and weaponized by tech companies. Starting with examples like Mecha-Hitler Grok, we explore how real safety engineering differs from AI "alignment," the myth of the alignment tax, and why this semantic confusion matters for actual safety.(00:00) - Intro (00:21) - Mecha-Hitler Grok (10:07) - "Safety" (19:40) - Under-specification (53:56) - This time isn't different (01:01:46) - Alignment Tax myth (01:17:37) - Actually making AI safer LinksJMLR article - Underspecification Presents Challenges for Credibility in Modern Machine LearningTrail of Bits paper - Towards Comprehensive Risk Assessments and Assurance of AI-Based SystemsSSRN paper - Uniqueness Bias: Why It Matters, How to Curb ItAdditional Referenced PapersNeurIPS paper - Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?ICML paper - AI Control: Improving Safety Despite Intentional SubversionICML paper - DarkBench: Benchmarking Dark Patterns in Large Language ModelsOSF preprint - Current Real-World Use of Large Language Models for Mental HealthAnthropic preprint - Training a Helpful and Harmless Assistant with Reinforcement Learning from Human FeedbackInciting Examplesars Technica article - US government agency drops Grok after MechaHitler backlash, report saysThe Guardian article - Musk’s AI Grok bot rants about ‘white genocide’ in South Africa in unrelated chatsBBC article - Update that made ChatGPT 'dangerously' sycophantic pulledOther SourcesLondon Daily article - UK AI Safety Institute Rebrands as AI Security Institute to Focus on Crime and National SecurityVice article - Prominent AI Philosopher and ‘Father’ of Longtermism Sent Very Racist Email to a 90s Philosophy ListservLessWrong blogpost - "notkilleveryoneism" sounds dumb (see comments)EA Forum blogpost - An Overview of the AI Safety Funding SituationBook by Dmitry Chernov and Didier Sornette - Man-made Catastrophes and Risk Information ConcealmentEuronews article - OpenAI adds mental health safeguards to ChatGPT, saying chatbot has fed into users’ ‘delusions’Pleias websiteWikipedia page on Jaywalking

Podzilla Summary coming soon

Get Free Summaries →

Free forever for up to 3 podcasts. No credit card required.