Generating load across a fleet of servers

Posted by ProgrammingAce on Mon 29 August 2016

I maintain an open source project called Hardwarespec that allows you to collect the hardware specifications of your servers across an entire fleet at once. Originally it was designed to collect the static metrics (CPU type, amount of installed memory, selinux status, etc) and report if those metrics change on any of the hosts. Recently I added extra functionality to collect variable data, like network speeds memory utilization, and number of missing security patches.

It’s easy enough to test Hardwarespec by spinning up a dozen or so EC2 instances and running the application, but that doesn’t give meaningful statistics since the systems aren’t doing anything. Running Hardwarespec across production servers is perfectly safe, but that’s not a good way to test unstable builds while I’m writing the code. I need a way to generate random artificial load across a dynamic fleet of EC2 instances. This proved to be more difficult than I expected…

An autoscale group is the clear way to go for building out a dynamic fleet of test instances. AWS gives us the ability to run a shell script at creation time for each server; we can use that to generate some load on our instances. This feature is called ‘user data’ in the AWS documentation, and it allows us to use bash scripts to configure our AWS instances.

There’s an old trick to max out the CPU resources on a linux machine by piping the ‘yes’ command to /dev/null. The OS generates as many ‘yeses’ as possible and immediately trashes them, locking up all of the CPU resources in the process. By introducing sleep commands between the yes execution, we can generate a specific amount of load on the CPU. If your OS is sleeping for 3/10ths of every second and generating ‘yes’ output for the remaining 7/10ths, you’ll have 70% CPU load. By generating a random number between 1 and 10 at startup, each system can be locked into a different CPU load percentage.

The only remaining difficulty is that user data scripts must return a success or failure value within a given timeframe; this means that we can’t put the user data script in an infinite loop to generate load. The linux shell has functionality called Heredoc that allows you to write out a file from within a shell script. I use that here to write out our load generating script, mark it as executable, then launch it as a daemon process.

The end result of this process is that I have an autoscaling group that produces EC2 instances that generate a random amount of CPU load. Below is the user data script in its entirety:

#! /bin/bash
cat << EOF > /usr/local/bin/
#! /bin/sh
export LOAD=0.\$((1 + RANDOM % 10))
export SLEEP=\$(echo "1 - \$LOAD" | bc)s
while true
    do yes > /dev/null &
    sleep \$LOAD
    killall yes
    sleep \$SLEEP
chmod +x /usr/local/bin/
nohup /usr/local/bin/ &