The evolution of cloud-native technologies and the need to scale engineering to support fast growing businesses has led organizations to restructure their teams and embrace new architectural patterns and paradigms. As systems become richer in features, they become more complex and more than ever critical to business success: that’s why a growing number of companies are turning to Chaos Engineering.
What is Chaos Engineering?
Chaos Engineering is the use of experimental and potentially destructive failure testing into any part of the stack (compute, storage, network or application infrastructure) to uncover vulnerabilities and reduce uncertainty when building complex distributed systems.
But Chaos Engineering is actually far from chaotic: it is a disciplined data-driven approach to running experiments that use chaotic behavior to stress systems and identify failures (or prove their resilience if tests are passed successfully) before they become outages. Main benefits of Chaos Engineering include exposing technical debt, building trust in the systems deployed or delivering improved reliability and resilience of systems to reduce downtime. These benefits, in turn, help improve customer experience, customer satisfaction, customer retention and new customer acquisition for public facing products.
How does it work?
Chaos Enginering acts like preventive medicine: much like attacking the immune system with a controlled injection of a weakened virus, Chaos Engineering trains an organization to deal with bugs and system failure. It moves the focus of testing a system to how it might gracefully fail or even continue to be useful while under various level of impact.
The proactive nature of Chaos Engineering enables organizations to manage and mitigate the risk of system downtime and disruption, while reducing the focus on reactive processes that emphasize incident management and service restoration.
So, what’s next?
While you might see this as a daunting practice at the beginning, only suitable for high-tech companies such as Netflix, Google or Amazon, the payoff by adopting Chaos Engineering can be significant.
For example, the Infrastructure & Security Team of SNCF Connect have been applying this discipline since 2015 to allow daily huge number of customers to navigate and book their travel in the best conditions… Especially when french people book 40 tickets every second, like during the recent opening of Christmas ticket sales (see this article)!
Chaos was the law of nature; Order is the dream of manHenry Adams
To conclude, Chaos Engineering could be the new game changer in your IT organization in the next 5 years, especially if avoiding downtime during peak season and building confidence in your customers about your system is your first priority!