Now that we have defined chaos engineering and the principles behind it, we can turn our attention towards the process. This section is repetitive.
To begin, we want to define a steady-state hypothesis. We want to know how the system looks before and after some actions. We want to confirm the steady-state, and then simulate some real-world events. After the events, we want to confirm the steady-state again. We also want to collect metrics, observe dashboards, and have alerts that notify us when our system misbehaves. Ultimately, we’re trying very hard to disrupt the steady-state, and the less damage we’re able to do, the more confidence we will have in our system.
Summary
The summary of the process we discussed is as follows.
Define the steady-state hypothesis
Confirm the steady-state
Produce or simulate “real world” events
Confirm the steady-state
Use metrics, dashboards, and alerts to confirm that the system as a whole is behaving correctly.
In the next lesson, we will go over a checklist of the chaos experiments that we will carry out.