Right now, you might be saying, “okay, I understand that chaos engineering is about learning from the destruction, but what does it really mean?” Let me give you a couple of use-cases for chaos engineering. Although I cannot go through all the permutations of everything we can do, a couple of examples might be in order.
We can validate what happens if you have improper fallback settings when a service is unavailable.
What happens when a service is not accessible, one way or another?
What happens if an app is retrying indefinitely to reach a service without having properly tuned timeouts?
What is the result of outages when an application or a downstream dependency receives too much traffic or when it is not available?
Will we experience cascading errors when a single point of failure crashes an app?
What happens when our application goes down?
What happens when there is something wrong with networking?
What happens when a node is not available?
Those are just a few of the questions we will explore through practical exercises.