Real life Goose load testing

In previous posts we’ve talked about Goose, and Gaggles, and how they work. Running Goose for an actual client load test is the true test of its use and, like any software, one of the best ways to find out where additional improvements can be made. Here, we’ll walk through how we set up and proceeded through a test with a Tag1 client, what we found, and where, and what Goose improvements are being added or considered.

Writing the test

The first part of creating a load test suite for a client is to write the tests themselves. We were able to take advantage of some existing profiles and templates to speed up the test creation process.

Template

The website we load tested was running Drupal 8, and as such the recently added Umami sample was an excellent starting point. This Goose example is designed specifically for the Drupal 9 Umami installation profile.

We were able to cut and paste some of the functions directly out of common.rs, such as logic for extracting forms and static elements out of loaded pages, using them with little or no changes in the customer's load test. The Umami example that ships with Goose provides a nice blueprint for adding a logged in user, and makes the username and password configurable through an environment variable. We were able to mostly copy this code and use it in the client load test.

Debugging and Testing

The --debug-file and --requests-file options made debugging and fixing issues while writing the test much faster than with other load testing tools we've used. Simple commands like grep -v 200 requests.log highlighted bad paths or incorrect assumptions. Having a debug log containing the request, headers, and body of any failed requests made tracking down problems quick and easy.

Structure

This client needed multiple similar websites load tested. We also copied the structure of the Umami load test, so quite a few functions are efficiently shared between the load tests in a common.rs file, further speeding up development, and ensuring that fixes to one test fixed the same issues on the other tests.

Running the test

Goose has a lot of options, and it can be confusing to know which to set. While it's possible to define defaults in the load test itself so you don't have to set the options on the command line, changing those defaults either requires updating the code, or knowing which flags to set. We created an extremely simple shell script named run_loadtest.sh:


    #!/bin/sh
    USERNAME=foo PASSWORD=bar cargo run --release -- --log-file goose.log -g -H https://example.com/ \
     --debug-file debug.log --debug-format raw --requests-file requests.log --status-codes -u250 \
     --hatch-rate .016666666 -v > metrics.log

We ran the above script in a screen session so multiple people could start/stop the test without worrying about accidentally launching multiple parallel tests.

The included options cause Goose to generate four distinct log files, each of which can be tailed or viewed independently while analyzing a load test:

goose.log

Due to the --log-file option and -g flag, this includes all INFO level and higher logs generated by Goose during the load test. It includes details about when each Goose User starts and what Task Set it's running, as well as any messages generated by the load test itself, such as an expected form element missing, an error when POSTing a form, or the page title being wrong.

requests.log

This is a log of every single request made during the load test, including whether it was redirected (and to where), and the response code returned by the server.

metrics.log

This log file contains the running metrics Goose generates as the load test runs. It provides a huge amount of detail including things like how many requests have been made, how long requests are taking, and which requests are failing.

By also adding the --status-codes flag as we do, this also includes a table that helps to quickly understand if the load test is generally working, or if the server has become unresponsive. It aggregates together the response codes being returned by all requests made during the test, and for example a large percentage of non-200 status codes can be indicative of an invalid test.

debug.log

This captures detailed information about each request that was flagged as a failure in your load test. Using the above examples, if an expected form element was missing on a loaded page, that request will be logged including the request made, all headers returned by the server (if any), and the body returned by the server (if any).

Slow Ramp Up

The --hatch-rate .016666666 may look odd, but this causes Goose to launch one new user every 60 seconds. This is useful for increasing traffic in a very controlled manner, and in sync with sampling services like NewRelic and Fastly. We edit this value to be much larger when we're testing a quicker ramp up, or want to generate a considerable amount of sustained load.

Debugging Application Bugs

We were able to very quickly tell Goose to add a debug log entry for all GET requests made, not just those returning a non-200 response code, and this helped us track down a bug in the application code that we were load testing.
Goose reported time out errors that we weren't seeing in the NewRelic reports. Thanks to the debug log, we were able to quickly see that the Fastly CDN layer was timing out, and ultimately tracked this down to a Varnish VCL configuration error.

Feature Requests

The more we use Goose in real-life situations, the more we discover ways it can be made even better. We filed a number of feature requests (some of which have already been implemented):

https://github.com/tag1consulting/goose/issues/224
(Implemented in 0.10.6) We used a very slow ramp up time during part of the load test, and it would have been useful to see Goose's metrics during that ramp up instead of just after all users are started or once the test was canceled;
https://github.com/tag1consulting/goose/issues/223
Goose proved to be extremely efficient and powerful, and in the first version of our load test it was generating far too much load far too quickly. While Goose can add a delay between Tasks, on this website individual Tasks require a lot of complexity and make a number of requests. We had to manually add a delay after each request inside the Task, something we'd prefer that Goose do automatically for us to make writing tests even faster.
https://github.com/tag1consulting/goose/issues/222
Goose breaks out metrics in two major ways: per Task, and per Request. It's helpful if each type of metric has a name to quickly understand the resulting metrics tables. In the beginning Goose didn't break out Tasks versus Requests in the metrics and instead only provided Request metrics, and so naming a Task was intended as a shortcut for naming all requests in the Task. Now that they're split out, the naming of Tasks needs to be made more flexible.
https://github.com/tag1consulting/goose/issues/220
(Implemented in 0.10.7) The debug log is a fantastic tool for troubleshooting failures caused by the load test. The debug log tries to capture as much information as possible, but large pages were getting truncated and the debug log was losing its line feeds. We tracked this down to Tokio's default BufWriter being only 8K, smaller than the web pages being returned during our tests and resulting in the truncated logs. We fixed by putting the line feed first, and also increasing the write buffer to 8M. Finally, we also made it possible to optionally disable the logging of response bodies.
https://github.com/tag1consulting/goose/issues/219
Math can be hard. We figured out that we can set the hatch rate to 0.016666666 to launch one user per minute, useful to one of the tests we performed for this client. It would be easier if there was an alternative option to just specify a ramp up time, for example "--ramp-up 60" which would calculate the hatch rate for us.
https://github.com/tag1consulting/goose/issues/221
(Implemented in 0.10.6) It is often important that each time you run the load test, it does exactly the same thing. For this reason, Goose was assigning GooseTasks to newly launching GooseUsers in a serial order exactly as defined in the load test. However, as we were simulating lots of anonymous users, and far fewer authenticated users (a common requirement) this meant during our slow ramp up we may have to wait over 15 minutes for enough anonymous user threads to start before authenticated user threads started launching.

This feature request has since been implemented, and changes how GooseTasks are assigned to launching GooseUsers, now defaulting to a round robin style. This means, in our example, one anonymous user starts, then one authenticated user, then another anonymous user, and another logged in user. The extra, anonymous GooseUsers are started at the end of the ramp up process instead of the beginning. This is configurable, so the old serial order can be restored, and a new random method was also introduced.
https://github.com/tag1consulting/goose/pull/215
(Implemented in 0.10.5) Goose originally used an integer to define the hatch rate, meaning that the slowest rate users could be started was one per second. With this change, it became possible to use a floating point number instead, allowing a slower ramp up as described earlier.

Goose in the wild

In all, this set of load tests was highly beneficial for both Tag1’s client, and for the Goose project itself. The client was able to judge the reliability and responsiveness of their website, making corrections for problems and optimizing their online storefront for another successful holiday season. The Tag1 team was able to identify several areas where Goose can be improved. Several useful additions to Goose were identified, and some of these were also implemented. In all, this set of tests was a success for Tag1, and our client.

To learn more about Goose, and how it works, see our list of Goose content. If you’re interested in a Tag1 load testing engagement, contact us..

Photo by Elly Johnson on Unsplash