Fast, cheap and automated
Deploying static websites to AWS
For the last year I relied on the Jekyll static site generator for my blog and hosted it on Github pages. This was a super convenient experience; it’s probably the easiest way to kick off a self hosted website. However, I sat down this weekend and migrated all the content to the Hugo static site generator and setup the hosting in the AWS cloud (Amazon Web Services).
Short background story: why Hugo? It is noticably faster. The compile time with Hugo is nearly instantaneous. I also like that it is less opinionated and allows more flexibility for the project structure.
Anyway, the topic of this blogpost is how to setup an automated and reliable deployment to AWS, regardless whether the static files were generated by Hugo, Jekyll or any other tool.
The ups and downs of AWS
AWS is a powerful cloud provider. It is made up of numerous small services that are configured independently and can be combined with each other like a construction kit. Moreover, the hosting of static content on AWS is ridiculously cheap. (Considering just S3, my monthly bill is usually around a few cents.)
However, the advanced features of AWS are a downside at the same time: the initial hurdle to get started is high and there is plenty room for error. Keep this in mind:
- Amazon has your credit card number: If you accidently leak your credentials or activate an expensive service, the costs can quickly go through the roof.1
- Permissions and access rights are a complex thing. If you make a mistake, someone can use your S3 bucket to share illegal files.
If you are new to AWS, you are best adviced to take the time and understand everything you do properly – even when this is time consuming and frustrating in the beginning.
Basic setup
For building and deploying our website, we use the following chain:
- Github is the place where all source files of the static website are at home.
- Travis CI builds and pushes the static website upon every change.
- AWS S3 holds the generated static files and serves them to the world.
- AWS CloudFront (optional) is the CDN service that speeds up load times even more. It also provides an SSL certificate for HTTPS connections.
So, let’s roll up the sleeves!
Github
I won’t go into detail about setting up a Github repo here. But let me point out once again the importance of not commiting any credentials whatsoever to your repo. Neither for AWS, nor for any other service that you use. If it happens after all, you must immediately revoke the affected credentials. (Just erasing them from the commit history is not sufficient!)
Travis CI
Within the Travis account we connect our Github repo. Everytime we push something to the repo, Travis will run a build: it takes care of generating the static sites, builds the assets and deploys the public folder to S3. It executes every build in a clean environment, thus making sure that no artifacts from previous builds or other temporary files happen to make their way into production. In order to tell Travis what to do, we must create a .travis.yml
file in the project root:
language: go
sudo: required
services:
- docker
install:
- pip install --user awscli
script:
- docker run --rm -v $(pwd):/app -w /app jojomi/hugo:0.31 hugo
after_success:
- aws s3 sync public/ s3://YOUR_BUCKET_NAME/ --delete
Our file consists of several blocks:
sudo: required
is necessary because we want to use docker within the build step. (Which we declare inservices
).install
: Since the binary for AWS is not part of the Travis default environment, we must install it first.script
: This is where the actual build happens. Here we also specifiy the Hugo version we want to use for performing the build.after_success
: Only if the previous step was successful we upload the resulting artifacts to AWS. Note that we use the AWS CLI rather then the out-of-the-box Travis S3 deployment, because the latter one doesn’t take care of deleting orphaned files, which is a major annoyance. Replace the constant with your bucket name.
In order for the AWS CLI to work, we must provide three environment variables in the settings of your Travis project: AWS_ACCESS_KEY_ID
, AWS_SECRET_ACCESS_KEY
and AWS_DEFAULT_REGION
. Make sure to obfuscate them in the build log! More on these keys later.
We can add additional steps to this configuration as we like. For instance, we could compile our SASS/LESS files by installing some CLI tool (e.g. npm install node-sass
) and then invoking it in the script
step.
AWS basic setup
IAM (identity and access management)
AWS has powerful mechanisms that allow to setup fine granular permissions. First, we go to IAM and create a new user for programmatic access. It’s fine to not assign a group to this user, in which case he won’t have any default permissions unless we explicitly set them2. Next, we go over to Travis and set the two environment variables for access key and access secret. (See above.)
S3 Bucket
S3 is short for simple storage service. It can be used to store all kinds of files. S3 has a HTTP based interface, so all file operations are performed with regular HTTP requests. This is the reason why the content of S3 buckets can be exposed to the world wide web so easily.
We create a new bucket by giving it a (unique) name and choosing a region. Next we enable static website hosting in the bucket properties, where we can also configure the index and error documents. Finally, we go over to Travis and enter the region identifier as environment variable. (See above.)
Policies
Unless we don’t specify a policy, neither Travis can upload anything nor can anyone view our content in a web browser. Policies can be attached to all kinds of AWS entities, so we can configure them directly in the properties of our S3 bucket. (“Properties” → “Permissions” → “Edit bucket policy”)
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "TravisCI",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::YOUR_AWS_ACCOUNT_ID:user/TRAVIS_IAM_USER_NAME"
},
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::BUCKET_NAME",
"arn:aws:s3:::BUCKET_NAME/*"
]
},
{
"Sid": "PublicWebsite",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": ["arn:aws:s3:::BUCKET_NAME/*"]
}
]
}
What’s happening here?
- With the first statement block, we grant full read and write access for the specific Travis user that we have created in the IAM in the previous step. This is important, because otherwise Travis cannot upload files to our bucket.
- With the second statement block, we grant read permissions to the entire bucket content for everyone. This is required for the public webserver to work, otherwise it is not recommended.
Of course you must replace the uppercase parts with your specific information. And again: be sure to fully understand what’s happening here and how AWS policies work.
DNS settings
Ensure that everything is setup and running by triggering a build. When the build was successfull, you can access your website content via the S3 endpoint, which looks something like this: BUCKET_NAME.s3-website-us-east-1.amazonaws.com
. Of course, this isn’t a very nice URL, so you might want to configure a CNAME record for your own domain that points to this address. (Consult your DNS provider or domain seller on how to do that.)
Sidenote: It’s not recommended to set CNAME records for root level domains, although this is technically possible. You can setup a CNAME record for www.example.org
, but you shouldn’t do so for example.org
.3 However, most DNS providers offer the option to redirect the root domain to a subdomain (like www.example.org
).
Advanced setup (optional)
AWS CloudFront and HTTPS
CloudFront is the CDN service of AWS that can be put in front of S3. This means that all our content isn’t delivered directly out of the bucket anymore, but it is served by CDN servers (so-called edges) all around the globe.
The biggest benefit for a smaller website like this blog is not performance. (S3 is usually pretty quick already.) Instead, CloudFront gives us the ability to setup an SSL certificate for you custom domain, which would not be possible with S3 alone. Note that even though the CloudFront default certificates are free, CloudFront itself introduces some additional costs (depending on how you use it).4
In order to setup CloudFront, create a Distribution in the AWS Console. It’s up to you whether you want to use the caching mechanism or not. Caching can be annoying sometimes, because file changes take much longer to be rolled out. If you set a custom TTL of 0, CloudFront will check the origin for modifications on each request.5 Don’t worry too much about the performance implications – they are most likely neglectable.
Here are some further remarks for the CloudFront settings:
- Price class: Beware that the pricing is dependent on the number of edges you want to deploy to.
- TTL: CloudFront respects the original caching headers, but you can override the expiration times with the TTL values, even with a value of 0. (Which would disable the cache effect altogether.)
- Origin domain name: If you want CloudFront to lookup
index.html
files everywhere, you have to connect to the endpoint of the S3 static website instead of the regular S3 endpoint (so, say,mybucket.s3-website-us-east-1.amazonaws.com
instead ofmybucket.s3.amazonaws.com
). Since requests to the static webserver are not authenticated internally, bucket access must be public in this case. - Bucket Permissions: Basically, permissions can be restricted to the CloudFront distribution. However, if you want the index-file lookup as described above, read access must be granted to everyone (
*
). - HTTPS: Activate HTTPS with a default CloudFront certificate. It’s free and AWS will take care of managing the certificate for you.
- CNAME: You must list all the CNAMEs that you have setup, otherwise the according requests via your own domain will fail.
- Query String Forwarding: It can be handy to activate this, if query strings get processed via JavaScript or if you want to bypass caching for particular assets (by appending an arbitrary query value to a URL.)
When you decide for CloudFront, you might consider to use s3cmd
instead of the AWS CLI in the deploy step, because it provides the option to automatically trigger cache invalidation for uploaded files (with the --cf-invalidate
option.)
Provisioning tools
Nowadays, infrastructure can be setup in a modern DevOps fashion with tools like terraform. This is not just cool, but it brings in several benefits like predictability and reproducibility. On the other hand though, this would add another layer of complexity and is probably overkill for a simple setup like ours. The infrastructure that is described here can be easily maintained via the AWS web interface (aka Cloud Console).
-
Leaked credentials are usually abused to mine bitcoins in the compromised accounts. ↩︎
-
It’s a common good practice to only grant the exact necessary access rights, even though it is more complicated. ↩︎
-
Setting up a CNAME record for the root domain can break email delivery on this domain. ↩︎
-
You can estimate your AWS expenses with this handy calculator. ↩︎
-
See the AWS docs. This StackOverflow post also provides useful information. ↩︎