[Notes] Chaos Engineering, Building Confidence in System Behavior through Experiments
Part I Intruduction
Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the system’s capability to withstand turbulent conditions in production.
-- Principles of Chaos
Using Chaos Engineering may be as simple as manually running kill -9 on a box inside of your staging environment to simulate failure of a service. Or, it can be as sophisticated as automatically designing and carrying out experiments in a production enviroment against a small but statistically significant fraction of live traffic.
The History of Chaos Engineering at Netflix: started in 2008
- Chaos Monkey: ball rolling, gaining notoriety for turning off services in the production environment
- Chaos Kong: transferred those benefits from the small scale to the very large
- Failure Injection Testing (FIT): the foundation for tackling the space in between