muckrAIkers

The Co-opting of Safety

August 21, 2025·1h 24m
Episode Description from the Publisher

We dig into how the concept of AI "safety" has been co-opted and weaponized by tech companies. Starting with examples like Mecha-Hitler Grok, we explore how real safety engineering differs from AI "alignment," the myth of the alignment tax, and why this semantic confusion matters for actual safety.(00:00) - Intro (00:21) - Mecha-Hitler Grok (10:07) - "Safety" (19:40) - Under-specification (53:56) - This time isn't different (01:01:46) - Alignment Tax myth (01:17:37) - Actually making AI safer LinksJMLR article - Underspecification Presents Challenges for Credibility in Modern Machine LearningTrail of Bits paper - Towards Comprehensive Risk Assessments and Assurance of AI-Based SystemsSSRN paper - Uniqueness Bias: Why It Matters, How to Curb ItAdditional Referenced PapersNeurIPS paper - Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?ICML paper - AI Control: Improving Safety Despite Intentional SubversionICML paper - DarkBench: Benchmarking Dark Patterns in Large Language ModelsOSF preprint - Current Real-World Use of Large Language Models for Mental HealthAnthropic preprint - Training a Helpful and Harmless Assistant with Reinforcement Learning from Human FeedbackInciting Examplesars Technica article - US government agency drops Grok after MechaHitler backlash, report saysThe Guardian article - Musk’s AI Grok bot rants about ‘white genocide’ in South Africa in unrelated chatsBBC article - Update that made ChatGPT 'dangerously' sycophantic pulledOther SourcesLondon Daily article - UK AI Safety Institute Rebrands as AI Security Institute to Focus on Crime and National SecurityVice article - Prominent AI Philosopher and ‘Father’ of Longtermism Sent Very Racist Email to a 90s Philosophy ListservLessWrong blogpost - "notkilleveryoneism" sounds dumb (see comments)EA Forum blogpost - An Overview of the AI Safety Funding SituationBook by Dmitry Chernov and Didier Sornette - Man-made Catastrophes and Risk Information ConcealmentEuronews article - OpenAI adds mental health safeguards to ChatGPT, saying chatbot has fed into users’ ‘delusions’Pleias websiteWikipedia page on Jaywalking

Podzilla Summary coming soon

Sign up to get notified when the full AI-powered summary is ready.

Get Free Summaries →

Free forever for up to 3 podcasts. No credit card required.

Listen to This Episode

Get summaries like this every morning.

Free AI-powered recaps of muckrAIkers and your other favorite podcasts, delivered to your inbox.

Get Free Summaries →

Free forever for up to 3 podcasts. No credit card required.