Format and Parse Amazon S3 URL

// #aws#s3#url // 1 comment

Amazon S3 URLs come in different flavors. There are those starting with s3:, http:, or https:. Then, there are the ones with s3.amazonaws.com, s3.us-east-1.amazonaws.com, or even s3-us-west-2.amazonaws.com (note the dash instead of the dot between s3 and the region code). And where do you put the bucket: is it <bucket>.s3.us-east-1.amazonaws.com/<key> or s3.us-east-1.amazonaws.com/<bucket>/<key>? And when it comes to static website hosting, of course, there is also <bucket>.s3-website-us-east-1.amazonaws.com and <bucket>.s3-website-us-east-1.amazonaws.com (again, note the dash and the dot).

There are even more when you include the dual-stack, FIPS, access point, and S3 control endpoints. Here's the full list of Amazon S3 endpoints. But for this post, I will focus on the more common URLs that I mentioned before.

Global

The global URL has the simplest format with the following structure: s3://<bucket>/<key>. This URL is also displayed by the AWS management console.

/

Path-style vs. Virtual-hosted-style

The difference between path-style and virtual-hosted-style URLs is how the bucket name is included in the URL. Path-style URLs have the bucket name in the pathname of the URL:

https://s3.<region>.amazonaws.com/<bucket>/<key>

On the other hand, virtual-hosted-style URLs have the bucket name in the hostname of the URL:

https://<bucket>.<region>.s3.amazonaws.com/<key>

Having the bucket name in the host has the advantage of using DNS to route different buckets to different IP addresses. If the bucket name is in the path, all requests have to go to one IP address even for different buckets. That is the reason path-style URLs are deprecated, and support for this style was supposed to end in 2020, but AWS changed their plan and continues to support this style for buckets created on or before September 30, 2020. There's an interesting blog post about the background: Amazon S3 Path Deprecation Plan – The Rest of the Story

Legacy vs. Regional

Some regions like US East (N. Virginia) us-east-1 have a legacy global endpoint that doesn't need a region code in the hostname:

# Legacy hostname with path-style <https://s3.amazonaws.com/><bucket>/<key> # Legacy hostname with virtual-hosted-style https://<bucket>.s3.amazonaws.com/<key>

If you use this type of URL for other regions that don't support it, you might either get an HTTP 307 Temporary Redirect or, in the worst case, an HTTP 400 Bad Request error, depending on when the bucket was created.

AWS recommends always using the regional endpoints with the region code in the hostname:

# Regional hostname with path-style https://s3.<region>.amazonaws.com/<bucket>/<key> # Regional hostname with virtual-hosted-style https://<bucket>.<region>.s3.amazonaws.com/<key>

Dot-style vs. Dash-style

But also here is a caveat: some regions used to have a dash - instead of a dot . between s3 and <region>:

# Dot-style https://s3.<region>.amazonaws.com/<bucket>/<key> # Dash-style https://s3-<region>.s3.amazonaws.com/<bucket>/<key>

For example, the US West (Oregon) us-west-2 region would support the legacy dash-style URL like https://s3-us-west-2.amazonaws.com/<bucket>/<key>. Nevertheless, the standard format https://s3.us-west-2.amazonaws.com/<bucket>/<key> is also available for these outliers.

REST vs. Website

All the URL formats we have seen so far, except the global S3 URL, are called REST endpoints. They are hosted on either the s3.amazonaws.com or s3.<region>.amazonaws.com hostname, but more importantly, they support secure HTTPS connections. That means all these URLs work with https:// as the protocol.

Amazon S3 also has a website endpoint for static website hosting. The website endpoint does not support HTTPS, only HTTP. These URLs have the following formats:

# Website hostname with dot-style http://<bucket>.s3-website.<region>.amazonaws.com/<key> # Website hostname with dash-style http://<bucket>.s3-website-<region>.amazonaws.com/<key>

Again, depending on the region, there is a dash - or a dot . separating s3-website and <region>. To see which one is right for your region, you have to check the list of Amazon S3 website endpoints.

Format and Parse S3 URLs

Depending on how you interact with Amazon S3, you might use one of the previous URLs. For example, the AWS CLI for S3 expects the S3 URL in the global format s3://<bucket>/<key>. Other clients and SDKs probably use the regional REST endpoint with the bucket name either in the hostname or pathname.

If you're using the wrong format or endpoint, you might get an error like this:

com.amazonaws.services.s3.model.AmazonS3Exception: The bucket is in this region: eu-west-1. Please use this region to retry the request (Service: Amazon S3; Status Code: 301; Error Code: PermanentRedirect;)

The right URL really depends on the individual client and how it is requesting from S3. To lift some of this burden, I created a tiny JavaScript library to check, format, and parse S3 URLs in the various formats I described earlier.

At the moment, the library exports only three functions: formatS3Url, parseS3Url, and isS3Url.

import { formatS3Url, parseS3Url, isS3Url, S3Object } from 'amazon-s3-url'; /* Types */ type S3UrlFormat = | "s3-global-path" | "s3-legacy-path" | "s3-legacy-virtual-host" | "https-legacy-path" | "https-legacy-virtual-host" | "s3-region-path" | "s3-region-virtual-host" | "https-region-path" | "https-region-virtual-host"; type S3Object = { bucket: string; key: string; region?: string; }; /* Signatures */ function formatS3Url(s3Object: S3Object, format?: S3UrlFormat): string; function parseS3Url(s3Url: string, format?: S3UrlFormat): S3Object; function isS3Url(s3Url: string, format?: S3UrlFormat): boolean; /* Examples */ // Global path // Without format param (defaults to s3-global-path) formatS3Url({ bucket: 'bucket', key: 'key' }); parseS3Url('s3://bucket/key'); isS3Url('s3://bucket/key'); // Legacy path-style // With format param for explicit formatting and parsing formatS3Url({ bucket: 'bucket', key: 'key' }, 'https-legacy-path'); parseS3Url('<https://s3.amazonaws.com/bucket/key>', 'https-legacy-path'); isS3Url('<https://s3.amazonaws.com/bucket/key>', 'https-legacy-path'); // Regional virtual-hosted-style // With region property for regional endpoints formatS3Url({ region: 'us-west-1', bucket: 'bucket', key: 'key' }, 'https-region-virtual-host'); parseS3Url('<https://bucket.s3.us-west-1.amazonaws.com/key>', 'https-region-virtual-host'); isS3Url('<https://bucket.s3.us-west-1.amazonaws.com/key>', 'https-region-virtual-host');

Limitations

The library does only rudimentary URL validation on the structure of the URL, but it doesn't validate the bucket name, object keys, and regions. Also, it doesn't support the dual-stack, FIPS, access point, control, and website endpoints yet. But I'm happy to welcome any external contribution.