One of the annoying things about web hosting is managing certificates - nobody wants to spend time creating Certificate Signing Requests and checking emails for expiry notices. They expire, and domains change and become invalid, leaving a system administrator to communicate with a Certificate Authority (CA) to get new certificates and install them on the servers that need them. This manual process is tiring, boring, and has the potential to bring downtime to your services.

A solution to this problem which arose within the last few years is the Automated Certificate Management Environment (ACME) protocol. It was developed by LetsEncrypt to fully automate the process of managing certificates. It is a client-server protocol, where the client would be a component of your infrastructure and the server is the CA that runs the ACME server. The most common server provider is LetsEncrypt, but the software that runs LetsEncrypt's ACME services is open source, so anyone can run their own ACME CA.

We at Tag1 don't like wasting hours on menial tasks, so we created an Ansible role to automate certificate management by leveraging the LetsEncrypt service and their ACME CA software.

Before going into the role, it helps to have an understanding of how LetsEncrypt works under the hood:

The ACME protocol uses a few types of 'challenges', which if met by your server, will allow the server to obtain a valid, trusted certificate. Each of the challenges are designed to allow the client to prove that they are a component of the domain. The challenge is always initiated by the ACME client.

The first, and most common type of challenge, HTTP-01, tries to contact your webserver at the path /.well-known/acme-challenge using the domain that you want to receive a certificate for. The ACME server sends your client a token, which is put at .well-known/acme-challenge on your site. The client then puts this token, along with a fingerprint of their ACME account (which is created by the ACME client if it is not created), at .well-known/acme-challenge in the webroot. The ACME client then tells the ACME server that the file exists, which initiates a check by the ACME server. The ACME server does a DNS lookup for the domain, and sends an HTTP request for <YOUR_DOMAIN>/.well-known/acme-challenge. If the ACME server can find the token and account fingerprint, then it has verified that the client belongs to your domain, at which point a new certificate is issued for the domain

The other type of challenge currently in use is the DNS-01 challenge. This challenge proves ownership of the domain by having the client add a TXT DNS record named _acme_challenge.<YOUR_DOMAIN> under the domain you wish to get the certificate for. This record needs to contain a token sent by the ACME server along with the ACME client's account fingerprint. This is very similar to the HTTP-01 challenge, but requires a dynamic DNS setup and the client needs to know how to add and remove DNS records for your specific provider. This extra complexity makes it less popular than the HTTP-01 challenge.

Tag1 Consulting does its fair share of infrastructure work, and the more work we can offload from system administrators onto the machine, the better. I looked at the current solutions for handling ACME/LetsEncrypt certificates, and none of them seemed to fit our use case. The geerlingguy.certbot role only manages renewal of ACME certificates, but does not allow adding certificates. The other roles that provide this functionality aren't well maintained and don't provide self-signed certificates, making them difficult to test. They also require Ansible to be run at regular intervals, much like the default Ansible modules (acme_account and acme_certificate).

Due to the nature of our work, we don't typically run Ansible at regular intervals. The only time we would run ansible against ours or our clients machines is if there was a change to the configuration. We were looking for a solution that we could run against a new webserver (and generate temporary self-signed certificates), and when the time came to get a trusted certificate, we could log in and flip a switch. Furthermore, none of the existing roles had a built-in way to test the server's configuration to ensure the challenge would pass, so there was risk involved. Thus, our new role was born.

The tag1consulting.letsencrypt role handles all certificate-related configuration for a webserver or load-balancer. Loaded with sane defaults, the most basic webservers only require two variables to be set to work. On top of that, the role includes a built-in ACME server that fakes DNS queries that you can use to test your server's configuration. Additionally, the role contains optional tests to ensure your webserver accepts the HTTP-01 challenge's path and is configured to serve the expected domain. This role even configures self-signed certificates, so your webserver can start up and you can easily start testing your webapp over TLS.

The role works fairly simply. Let's say we have a simple Apache2 vhost on a new server that we are going to migrate our main site to (DNS is still pointing at the production machine, so the HTTP-01 challenge will fail). The vhost in our example will respond to the domain, with the webroot at /var/www/, and we want to have LetsEncrypt certificates when it goes live.

Our simplified example vhost looks like so:

<VirtualHost *:443>
  DocumentRoot /var/www/

  SSLEngine On
  SSLCertificateFile /etc/letsencrypt/live/
  SSLCertificateKeyFile /etc/letsencrypt/live/
  SSLCertificateChainFile /etc/letsencrypt/live/


This vhost definition will not change when using our LetsEncrypt role. It will generate the SSLCertificate* files as they are written.

The following minimal configuration could be used to setup the server for LetsEncrypt

# Define which domains this machine should receive TLS certificates for
  - domain:
    directory: /var/www/

# Mandatory option. This is a publicly routable email address. This is
# Sent from LetsEncrypt's servers, not locally. Addresses such as 'root'
# Will not work. All expiry notices and protocol change notices will be
# Sent to this email address.

When our LetsEncrypt role is run with the above configuration, it will do several things:

  • Generate untrusted cert, chain, and private key files at /etc/letsencrypt/live/
  • Create a cronjob to renew certificates with CertBot
  • Create a migration script at /root/

The Apache server should now run as configured without error, and it should use the untrusted TLS files generated by the role. At this point, we could deploy the site code. Once everything is verified to work as expected, you could either proxy the /.well-known path to the new webserver or update DNS to point to it. Next, all that is needed to get your trusted certificate and automate renewals is to run /root/

This migration script template is extremely verbose and very careful to not do something that could break the webserver. It backs up files under /etc/letsencrypt/live (at this point, this consists of the untrusted cert, chain, and key files), checks that the webserver configuration is valid by attempting to reload the webserver service, requests and installs the LetsEncrypt certificate (from a local testing server if it's detected), then reloads the webserver service. If something goes wrong at any point, such as a failed ACME challenge, the script works to restore the state of the server back to what it was before the script was ran. It will restore the backup of /etc/letsencrypt/live and reload the webserver.

At this point, the server is now running a valid certificate for, and all it took was two variable definitions!

This role is tested and supported on CentOS 7 and Ubuntu18.04, and is likely to work on other RHEL and Debian derivatives. We welcome contributors with open arms.

Image by D1_TheOne from Pixabay