AWS Website Hosting with Private S3 Bucket

// 4 comments

In my first post of this series, I delved into the classic approach to hosting a S3 website with HTTPS and a custom domain. This follow-up post will focus on the modern approach. There are only a few differences, and I will focus on these. I recommend reading the first post before continuing.

Recap

Let's begin with a short recap. Amazon S3 has a feature called static website hosting. If enabled, it provides an endpoint like http://<bucket>.s3-website.<region>.amazonaws.com/<key> that's accessible only via HTTP and not HTTPS. For HTTPS access, a CloudFront distribution must be created that provides an endpoint such as https://<distribution>.cloudfront.net. A custom domain can also be assigned to this CloudFront distribution. Most importantly, however, the bucket and all its content must be publicly available, meaning the Block Public Access setting must be disabled.

S3

The modern approach to S3 hosting has a few changes compared to the classic approach. These affect the bucket website configuration, permission and public access, and the endpoint. Let's go through them.

Website Configuration

The static website hosting feature is no longer necessary to host a website on S3, but only when utilizing CloudFront.

Permission and Public Access

Disabling Block Public Access on the bucket permission poses a security concern. It's essential to ensure no sensitive information is stored in your bucket as it will be accessible from the internet. AWS has provided numerous warnings in their documentation regarding this setting:

Warning
Before you complete this step, review Blocking public access to your Amazon S3 storage to ensure that you understand and accept the risks involved with allowing public access. When you turn off block public access settings to make your bucket public, anyone on the internet can access your bucket. We recommend that you block all public access to your buckets.
Edit Block Public Access settings

Fortunately, we no longer need this setting because we can restrict access to the bucket with a bucket policy. This is called Origin Access Control (OAC) and we add it directly to the bucket policy:

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "cloudfront.amazonaws.com" }, "Action": "s3:GetObject", "Resource": "arn:aws:s3:::<S3 bucket name>/*", "Condition": { "StringEquals": { "AWS:SourceArn": "arn:aws:cloudfront::<AWS account ID>:distribution/<CloudFront distribution ID>" } } }, { "Effect": "Allow", "Principal": { "Service": "cloudfront.amazonaws.com" }, "Action": "s3:ListBucket", "Resource": "arn:aws:s3:::<S3 bucket name>", "Condition": { "StringEquals": { "AWS:SourceArn": "arn:aws:cloudfront::<AWS account ID>:distribution/<CloudFront distribution ID>" } } } ] }

Giving the origin access control permission to access the S3 bucket

This policy allows CloudFront read-only access (s3:GetObject and s3:ListBucket) to the bucket. The Condition further restricts the policy to access for only certain CloudFront distributions.

403 Access Denied vs. 404 Not Found

It's important to note that the response from Amazon S3 for not existing objects varies between 404 and 403.

If the object that you request doesn’t exist, the error that Amazon S3 returns depends on whether you also have the s3:ListBucket permission.
If you have the s3:ListBucket permission on the bucket, Amazon S3 returns an HTTP status code 404 (Not Found) error.
If you don’t have the s3:ListBucket permission, Amazon S3 returns an HTTP status code 403 (Access Denied) error.
Amazon S3 GetObject Permissions

We added the s3:ListBucket permission to the bucket policy above to uncover these false 403 Access Denied errors which are actually 404 Not Found errors. Also, this shouldn't be a security concern since the bucket is not public and only accessible by CloudFront.

Endpoints

Since we are no longer using the static website hosting feature of S3, we only have the REST endpoint in this format:

https://<bucket>.s3.<region>.amazonaws.com/<key>.

The old website endpoint is no longer available in this setup:

http://<bucket>.s3-website.<region>.amazonaws.com/<key>

CloudFront

There are only a few changes required for the Amazon CloudFront setup.

Origin

The origin is still Amazon S3, but we now use a standard S3 bucket instead of a bucket that's configured as a website endpoint. That means we have to specify the REST endpoint of the bucket. The origin dropdown in the CloudFront console will only show standard S3 buckets anyway, so you can pick the bucket right from there.

Origin Access

Origin access grants the CloudFront distribution access to the S3 bucket. There are three settings: public, origin access control, and legacy access identities. We use the recommended origin access control (OAC) setting. The OAC can be created directly in the console or selected if it exists. After the distribution is created, the S3 bucket policy must be updated to grant CloudFront access. This policy is the same as I mentioned earlier.

Default Root Object

The default root object is the file that is returned when a request to the root URL of your distribution is made.

When you define a default root object, an end-user request that calls the root of your distribution returns the default root object. For example, if you designate the file index.html as your default root object, a request for:
https://d111111abcdef8.cloudfront.net/
Returns:
https://d111111abcdef8.cloudfront.net/index.html
How default root object works

This is similar to the index document of Amazon S3 static website hosting, but with a subtle difference. Amazon S3 returns the index document for requests to subdirectories, while CloudFront will only return the default root object for root-level requests.

For example, a subdirectory request to https://d111111abcdef8.cloudfront.net/photos/ will not return the default root object index.html, even if it exists in the subdirectory as photos/index.html.

Custom Error Response

If the requested object is not available for whatever reason, CloudFront will normally return the appropriate HTTP status code to the user. For example, if the object at https://d111111abcdef8.cloudfront.net/photos/ doesn't exist, CloudFront will respond with status code 404 Not Found. This behaviour can be configured with custom error responses.

Again, this is similar to error documents of Amazon S3 static website hosting, but more powerful. Amazon S3 returns the error document only for 404 Not Found errors. CloudFront, on the other hand, lets us customize how and what to respond for various HTTP 400 and 500 errors.

Revisiting our previous example, this behavior enables us to return the default root object index.html for all subdirectory requests like https://d111111abcdef8.cloudfront.net/photos/ that would usually trigger 404 Not Found errors. However, we have the option to overwrite the response status code and return a 200 OK. This is a significant difference from Amazon S3 static website hosting, which returns the error document with a 404 Not Found response code. Some browsers will ignore the content returned by such a response and display their own error page instead.

On a side note: this feature is especially useful for Single Page Applications (SPA) which are usually served through a single index.html file. Navigation between pages is mostly handled on the client side without initiating new HTTP requests. Find out more on my previous post concerning this topic.

CloudFormation Template

The following repository contains two templates which can be deployed via CloudFormation console or the AWS CLI.

  • templates/classic.yml: uses a public S3 bucket with static website hosting. See my first post for more information.

  • templates/modern.yml: uses a private S3 bucket without static website hosting.