Exit « Data Science »… Chaos Engineering is the new sexy!

What is Chaos Engineering?

Chaos Engineering is the use of experimental and potentially destructive failure testing into any part of the stack (compute, storage, network or application infrastructure) to uncover vulnerabilities and reduce uncertainty when building complex distributed systems.

But Chaos Engineering is actually far from chaotic:  it is a disciplined data-driven approach to running experiments that use chaotic behavior to stress systems and identify failures (or prove their resilience if tests are passed successfully) before they become outages. Main benefits of Chaos Engineering include exposing technical debt, building trust in the systems deployed or delivering improved reliability and resilience of systems to reduce downtime. These benefits, in turn, help improve customer experience, customer satisfaction, customer retention and new customer acquisition for public facing products.

Example of Chaos Engineering system based on AWS Services

How does it work?

Chaos Enginering acts like preventive medicine: much like attacking the immune system with a controlled injection of a weakened virus, Chaos Engineering trains an organization to deal with bugs and system failure. It moves the focus of testing a system to how it might gracefully fail or even continue to be useful while under various level of impact.

Example of Fault Injection attack on AWS Network

The proactive nature of Chaos Engineering enables organizations to manage and mitigate the risk of system downtime and disruption, while reducing the focus on reactive processes that emphasize incident management and service restoration.

So, what’s next?

While you might see this as a daunting practice at the beginning, only suitable for high-tech companies such as Netflix, Google or Amazon, the payoff by adopting Chaos Engineering can be significant.

For example, the Infrastructure & Security Team of SNCF Connect have been applying this discipline since 2015 to allow daily huge number of customers to navigate and book their travel in the best conditions… Especially when french people book 40 tickets every second, like during the recent opening of Christmas ticket sales (see this article)!

Chaos was the law of nature; Order is the dream of man

Henry Adams

To conclude, Chaos Engineering could be the new game changer in your IT organization in the next 5 years, especially if avoiding downtime during peak season and building confidence in your customers about your system is your first priority!

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *