We’ve talked a lot about our Simian Army — a collection of various monkeys we utilize to break our system — so that we can validate that our many services are resilient to different types of failures, and learn how we can make our system anti-fragile. Chaos Monkey — probably the most well known member of Simian Army runs in both Test and Production environments, and most recently it now includes Cassandra clusters in its hit list.
To validate our architecture was resilient to larger types of outages, we unleashed bigger Simians: