Chaos Faults for Linux
Introduction
Linux faults disrupt the resources running on a Linux machine. This deteriorates the performance of the application for the duration of the chaos experiment.
Resource consumption
The infrastructure consumes minimal system resources in an idle state, when no experiment is being executed. For example, in a GCP e2-micro VM instance with 2 vCPU and 1 GB of memory that runs Ubuntu 22.04 operating system, the average resource consumption was found to be as follows:
- CPU usage: 0.05%
- Memory usage: 1.5%
- Disk storage consumption: 25 MB
- Bandwidth consumption: 0.15 KB/s
Fault compatibility matrix
The faults have been tested for compatibility in the following Linux OS distributions:
Stress faults (cpu, memory, disk IO) | Network faults (loss, latency, corruption, duplication) | DNS faults (error, spoof) | Process faults (process kill, service restart) | Time chaos | Disk fill | |
---|---|---|---|---|---|---|
Ubuntu 16+ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
Debian 10+ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
CentOS 7+ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
RHEL 7+ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
Fedora 30+ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
openSUSE LEAP 15.4+ / SUSE Linux Enterprise 15+ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
Linux CPU stress
Linux CPU stress stresses the CPU of the target Linux machines for a specific duration.
Linux memory stress
Linux memory stress causes memory consumption of the target Linux machines for a specific duration.
Linux disk IO stress
Linux disk I/O Stress fault stresses the disk of the target Linux machines over IO operations for a certain duration.
Linux network loss
Linux network loss injects chaos to disrupt network connectivity on Linux machine by blocking the network requests.
Linux network latency
Linux network latency injects chaos to disrupt network connectivity on the Linux machine by adding delay to the network requests.
Linux network corruption
Linux network corruption injects chaos to disrupt network connectivity on Linux machine by corrupting the network requests.
Linux CPU stress
Linux CPU stress applies stress on the CPU of the target Linux machines for a certain duration.
- Induces CPU stress on the target Linux machines.
- Simulates a lack of CPU for processes running on the application, which degrades their performance.
Use cases
- Induces CPU stress on the target Linux machines.
- Simulates a lack of CPU for processes running on the application, which degrades their performance.
- Simulates slow application traffic or exhaustion of the resources, leading to degradation in the performance of processes on the machine.
Linux memory stress
Linux memory stress causes memory consumption of the target Linux machines for a specific duration.
Use cases
- Induces memory consumption and exhaustion on the target Linux machines.
- Simulates a lack of memory for processes running on the application, which degrades their performance.
- Simulates application slowness due to memory starvation, and noisy neighbour problems due to excessive consumption of memory.
Linux disk IO stress
Linux disk I/O stress applies stress on the disk of the target Linux machines over I/O operations for a specific duration.
Use cases
- Simulates slower disk operations for the applications.
- Simulates noisy neighbour problems by exhausting the disk bandwidth.
- Verifies the disk performance on increasing I/O threads and varying I/O block sizes.
Linux network loss
Linux network loss injects chaos to disrupt network connectivity on the Linux machine by blocking the network requests.
Use cases
- Induces network loss on the target Linux machines.
- Simulates loss of connectivity access by blocking the network requests on the machine.
Linux network latency
Linux network latency injects chaos to disrupt network connectivity on a Linux machine by adding delay to the network requests.
Use cases
- Induces network latency on the target Linux machines.
- Simulates latency in connectivity access by delaying the network requests of the machine.
Linux network duplication
Linux network duplication injects chaos to disrupt network connectivity on a Linux machine by duplicating network packets.
Use cases
- Induces network duplication on the target Linux machines.
- Simulates packet duplication in the network.
Linux network corruption
Linux network corruption injects chaos to disrupt network connectivity on a Linux machine by corrupting the network requests.
Use cases
- Induces network corruption on the target Linux machines.
- Simulates network corruption by corrupting requests of the machine.
Linux DNS error
Linux DNS error injects chaos to disrupt the DNS resolution on a Linux machine.
Use cases
- Induces DNS error on the target Linux machines.
- Simulates loss of access to host by blocking the DNS resolution of host names.
Linux DNS spoof
Linux DNS spoof injects chaos to mimic DNS resolution on a Linux machine.
Use cases
- Induces DNS spoof on the target Linux machines.
- Resolves DNS target host names (or domains) to other IPs provided as user input.
Linux process kill
Linux process kill fault kills the target processes running on the Linux machines.
- It checks the performance of the application or process running on the Linux machine.
Use cases
- Induces process kill on the target Linux machines.
- Disrupts the application critical processes such as databases or message queues by killing their underlying processes or threads.
- Determines the resilience of applications when processes on a Linux machine are unexpectedly killed (or disrupted).
Linux service restart
Linux service restart stops the target system services running in a Linux machine.
- It determines the performance and resilience of the application (or services) running on Linux machines.
Use cases
- Service restart determines the resilience of an application upon random halts.
- Determines how efficiently an application recovers and restarts the services.
Linux time chaos
Linux time chaos injects chaos to change the time of the Linux machine.
Use cases
- Induces time chaos to change the system time on the target Linux machines.
- Determines the resiliency of the underlying application components when subjected to a change in the system time.
Redis cache limit
Redis cache limit fault limits the amount of memory used by a Redis cache. The original limit is restored after the chaos duration.
Use cases
- Determines the resilience of Redis-dependant applications on frequent cache misses that occur due to a low cache size.
Redis cache expire
Redis cache expire expires a given key (or all keys) for a specific duration. Due to this, you won't be able to access the key/s associated with the cache during chaos.
Use cases
- Determines the resilience of Redis-dependant application when a key expires on a Linux machine.
Redis Sentinel stop
Linux Redis Sentinel stop fault stops the Redis Sentinel server for a specific chaos duration and then starts it.
Use cases
- Determines the resilience of Redis-dependant applications on frequent cache misses that occur due to a low cache size.