
Conducted during a busy release weekend, the failover test exposed gaps not in the technology itself, but in coordination and communication. While production ultimately stayed unaffected, the situation quickly escalated as subcontractors weren't aligned, assumptions didn't match reality, and information didn't flow when it mattered most. We unpack how a well-intentioned test turned into a coordination challenge, where timing, dependencies, and unclear responsibilities created confusion across teams. It's a story about how resilience isn't just about systems and infrastructure, but also about people, processes, and making sure everyone is on the same page — especially when things are supposed to "just be a test." Want even more info ? Read our show notes and related blog post related to this episode : https://blog.ithorrorstories.eu/episode-12-the-failover-that-failed-successfully/ All other links and socials : https://links.ithorrorstories.eu/ 00:00 Welcome & Setup 01:34 Corporate Environments 03:30 Failover Planning 07:19 Double Disaster 09:08 Critical Failure 13:20 Realization Moment 15:28 Split Brain 17:34 The Recovery 21:13 Lessons Learned 31:32 Conclusion
Podzilla Summary coming soon
Sign up to get notified when the full AI-powered summary is ready.
Free forever for up to 3 podcasts. No credit card required.

Sleep Mode in Production - How a Laptop Took Down a Warehouse

Ransomware Lockdown - What a Real-World Ransomware Attack Looks Like: Incident Response and Recovery

Jack's Rants - Floppy Disks at 35000 Feet - Why Boeing 747s Still Use Floppy Disks for Flight Management Systems

Jack's Rants - IT Hiring Chaos – The Strange State of the Tech Job Market
Free AI-powered recaps of IT Horror Stories with Jack Smith and your other favorite podcasts, delivered to your inbox.
Free forever for up to 3 podcasts. No credit card required.