ACLU Partners with Tag1 to Raise Most-Ever $120M in Donations at Mission-Critical Moments
The American Civil Liberties Union (ACLU) is one of the most widely recognized civil rights organizations not only in the United States but around the world. Founded in 1920, the non-profit organization has almost three million supporters committed to its mission to defend individual rights and liberties in the United States. The ACLU led the fight against Japanese-American internment camps during World War II, advocated for marriage equality since the 1970s, and fought the 1996 Communications Decency Act, which would have resulted in internet censorship.
As it evolves in the digital age, the ACLU now has forty Drupal sites, including the ACLU Action website (action.aclu.org), through which users can sign petitions, send messages to elected officials, request legal aid, donate to the ACLU, and sign up to volunteer.
Third-party API scaling (payment gateways, CRM, etc.)
Legacy systems support and optimization
Advanced system monitoring and predictive warnings
Infrastructure and application migration
Tier 3 infrastructure and architecture support
News events led to dramatically increased traffic, causing the ACLU’s donation platform to go down under load, impacting revenue and supporter engagement at a critical time for the organization. Performance tuning under normal circumstances is difficult, but even more so while under extreme load and experiencing downtime with millions of dollars being lost by the hour.
The ACLU called on Tag1’s Technical Architecture and Leadership to perform emergency support and rescue work to get the ACLU Action website back online as quickly as possible and to help it withstand even bigger traffic spikes in the future. The results of Tag1’s efforts were 3,000% increase in donations from a yearly average of $4mm to $120mm, $24mm in donations on a single weekend, 57% faster database response times, 900% throughput increase in requests per minute. In addition, systems stay online and perform quickly under extreme loads.
Situation: New Headwinds Caused by News Headlines
Since the election of Donald Trump to the American presidency, the ACLU has faced massive and unpredictable traffic spikes that consistently prevented the organization from collecting donations and interacting with American citizens. In short, the ACLU was prevented by its traffic spikes from serving its most mission-critical functions.
Over the course of 2016 and 2017, the ACLU’s online presence was buffeted by headwinds caused by headlines like the election of Donald Trump, the ACLU Executive Director’s appearance on The Rachel Maddow Show, the Muslim travel ban, and the reversal of net neutrality rulings by the Federal Communication Commission (FCC). For instance, in the five days after the previous presidential election in November 2012, the ACLU processed $25,000 in donations; in 2016, that amount had surged 28,700% to $7.2 million in the first five days after that year’s election.
Challenge: Surviving and Learning from an Even Larger Traffic Spike
Shortly after election day, on November 16, 2016, Andrew D. Romero, executive director of the ACLU, appeared on The Rachel Maddow Show, and donations began pouring in. As tens of thousands of users attempted to sign up for mailing lists, send letters to elected officials, and donate to the ACLU—at a rate of 500 new members per minute—database queries that would normally need milliseconds under typical traffic loads began to take upwards of 10 seconds or even more than a minute to complete. Many users faced ‘503 Service Unavailable’ and other 5xx errors as the website struggled to stay online. It was a significant missed opportunity to capture new members, volunteers, and donors. Recognizing the potential ramifications on the organization’s financial goals, the ACLU called in Tag1’s Technical Architecture and Leadership, a service Tag1 provides to help clients solve difficult problems, to perform emergency support and rescue work to get the ACLU Action website back online as quickly as possible.
Supporting an 85-fold increase in donations and user traffic required immediate and expert insights to remedy adverse financial impacts caused by site downtime. Within 48 hours, Tag1 rapidly audited and scaled the full technology stack and infrastructure to bring the ACLU Action platform back online and lay the foundation for long-term scalability, enabling the ACLU to continue supporting civil liberties at critical junctures in American history.
Solution: Performance and Scalability Consulting
In order to maintain its capacity to keep a high-performing and mission-critical website online, not only for staunchly loyal supporters but also for those newly discovering the organization’s mission, the ACLU tapped Tag1 Consulting to scale new heights with its online fundraising. Thanks to Tag1’s Technical Architecture and Leadership (TAL) program, the ACLU Donation platform (action.aclu.org) scaled to handle 85x growth from 44,000 pageviews per day to up to 4 million pageviews per day. In 2017 alone, online donations grew 40-fold from an average of $4 million annually to over $120 million, with massive bursts including $24 million in donations on a single weekend.
Under the best circumstances, performance tuning a database-bound web application—like a content management system or donor e-commerce system—is difficult. Even more so when under extreme load with servers down, and millions of dollars potentially lost by the hour. Patching a system under duress requires methodical planning and decisive action as well as a deep understanding of infrastructure, database, and application architectures. Performance tuning optimizations need to be identified, tested, and implemented all while maintaining uptime.
To ensure ACLU Action stayed online, Tag1’s Infrastructure Architects and Technical Architects spearheaded collaboration between hosting providers, payment gateways, CDN providers, and internal departments at the ACLU.
A Simple Static Solution for Immediate Mitigation
We were able to mitigate the massive traffic spike originating from Anthony Romero’s interview thanks to a creative yet simple solution. Together with the ACLU, we set up a static HTML donation page and delivered it through a content delivery network (CDN), temporarily bypassing the ACLU’s servers. This helped alleviate the initial financial impact as donations were able to continue unhindered, allowing the team to shift focus to remedying the underlying foundational problems that caused the platform to fall over in the first place.
With the help of available data from New Relic, Pantheon, database logs, and other systems, Tag1 developed a strategy of quick query optimizations and indexing changes on the live database, adding in improved caching mechanisms, to reduce server load and bring the site back online. Once these optimizations were online, the initial page response time decreased by 57% from 1400 milliseconds to 650 milliseconds.
Iterative Optimization with Load Testing
The ACLU and Tag1 teams collaborated on load testing and iterative performance optimizations through fast feedback loops to uncover the problem areas that led ACLU Action to fail, correct these issues, and test improvements incrementally with ever larger loads to diagnose new issues. In addition, to get ahead of potential problems proactively and avoid additional downtime as teams continued their scaling work, Tag1 enabled New Relic’s advanced notifications to track key functionality, with predictive early-warning alerts to notify Tag1 and the ACLU before problems occurred, and to track SLAs for transactions according to the ACLU’s criteria for maintaining critical business operations. By the end of Tag1’s performance tuning, the ACLU Action platform was able to withstand almost ten times the traffic it had before.
Tag1 also began working to perform accurate load testing that would replicate the extreme levels of traffic the ACLU Action platform faced during the spikes of 2016 and 2017, no easy feat given the intense load. Nonetheless, performing load testing that would approximate an 85-fold increase in daily visitors is challenging but necessary to diagnose issues arising in those high-traffic conditions and perform tests that replicate production server configuration. With Pantheon’s support, Tag1 provisioned a production-replica test environment as a base on which to perform load tests with Locust, and Amazon EC2.
Watch our DrupalCon talk, “FOREO - Scaling Applications and Mission Critical Events” to learn more about our load testing approach.
Payment gateway optimizations
On January 27, 2017, the Trump administration implemented the Muslim travel ban, kicking off another massive wave of new supporters as the ACLU filed the first injunction against the measure. During the subsequent weekend, the ACLU Action platform witnessed an 85-fold increase in traffic from 44,000 to 4 million daily page views. Fortunately, with the aforementioned site improvements in place, the ACLU benefited from a record-setting weekend with $24.1 million, over quadruple what the ACLU had raised during the same period before.
However, while the ACLU servers withstood the dramatic increase in load, third-party vendors responsible for payment processing were unable to keep up with the traffic originating from the ACLU. In fact, due to the amount of traffic flooding in as a result of donations, the ACLU’s payment gateway presumed it was due to a distributed denial-of-services (DDoS) attack and proceeded to throttle requests. As a result, donations began to process slowly or fail entirely, and there was little to no opportunity for the payment gateway to recover. The payment gateway was unable to resolve the problem in a timely manner. To mitigate the issue, Tag1 created and later open-sourced the cURL Log module, which identifies for users which cURL requests fail and at what time. Tag1’s team also sanitized results as they came in, in order to enforce the ACLU’s strict privacy standards, and prevent personally identifiable information from traveling down the wire.
Then, Tag1 load-balanced the payment gateway’s API and developed an on-the-fly failover mechanism by creating a cURL Load Balance module to ensure high-performance throughput in PHP even when limited by third-party policies. The module accepts requests, in this case user payments, and monitors third-party API responses. If the module records a set number of failures, it reroutes the requests to another endpoint, which in this case was an alternative endpoint made available by the payment processor. As one endpoint began to deny or throttle requests, we were able to reroute and keep transactions processing smoothly during donation spikes.
In the end, we sustained 356,306 online donations totaling roughly $24 million in the weekend after the January 27th announcement of the Muslim travel ban. During the same weekend, thanks to Tag1’s efforts, the ACLU was able to handle over 85 times the traffic they had during the previous year, and database response times were more than halved (57% faster) from the year before. Even better, in one single weekend, the ACLU received more than six times the donation revenue than the entire year before.
Results: Successful Emergency Rescue and Support
With Tag1’s iterative improvements over the course of the ACLU’s record year, yet another test of the site came with the repeal of Net Neutrality by the F.C.C. in December of 2017. With all the work that had been put into tuning, testing, and optimizing ACLU Action (and how it communicated with payment gateways)—it was our true trial by fire. This time, as traffic and donations flooded the site, nothing went wrong—the site stayed online and donations kept flowing in.
Over the course of that night, Tag1’s watchful support for the ACLU ensured that the site served a peak of 1,900 submissions per minute for donations and support actions, well more than triple the load it was able to handle before Tag1’s Technical Architecture and Leadership resources were brought in. The ACLU and Tag1 continue to monitor, test, optimize, refine, and iterate performance tuning, always ready for the next news headline and the next traffic spike, wherever it may come from.
Statistics at a glance
Average requests per minute increased from 500 to 5,000
Site throughput increased from 10.27 to 8,266 requests per second
A record-setting 2017 with online donations totaling over $120 million, including 356,306 online donations totaling $24,164,691 in a single weekend (January 27th).
280% improvement in donations and form submissions per minute
57% improvement to average database response times
705% increase in requests served per minute
1000% increase in sustainable average requests per minute, with site throughput improving from 616 to an average of nearly 500,000 requests per minute
Timeline of Events
November 8, 2016
Donald Trump elected president
Donations spiked 28,700% from $25,000 in the five days after the 2012 election to $7.2 million in the five days after the 2016 election
November 16, 2016
Executive Director Romero appears on Rachel Maddow
Anthony Romero’s appearance on The Rachel Maddow Show causes ‘503 Service Unavailable’ and other errors, with substantial missed revenue
January 27, 2017
Donald Trump signs the Muslim travel ban executive order
After Tag1’s performance optimizations, traffic grew to 85x normal levels (from 44,000 pageviews per day to over 4 million), and the site was able to handle 900 submissions per minute, raising $24.1 million in a single weekend, with no down time, but 3rd party payment gateways failed
December 14, 2017
The FCC repeals net neutrality
As a result of Tag1’s performance tuning, the ACLU Action platform peaks at an all-time high of 1,900 submissions per minute without any downtime for the ACLU or their payment gateway