Backing Up Data with AWS Glacier Deep Archive

After recently surpassing the capacity of my external backup disk, I had a decision to make: Should I buy a bigger one or try something else? An 8TB external hard drive currently costs about $140, which isn’t too bad, but comes with a number of downsides:

  • Upfront cost
  • Needing to remember to plug it in and back up my data
  • All of my data is in the same place (no offsite storage)

Overall, these downsides aren’t too bad, but I figured I’d try something new to get around these problems. After looking for cheap cloud storage solutions from reputable companies, it seemed that Amazon Web Services’ Glacier Deep Archive was a viable option. Here are some of the upsides:

  • Very cheap storage ($0.00099/GB per month in the Ohio region)
  • My data is stored offsite
  • I can write a script to automatically sync from my computer to AWS as often as I want

I have about 3TB of data that I need to store, which means I should be charged about $3 per month to store my data. Not bad.

Configuring the Upload

Depending on how much data you have and how willing you are to use a terminal, you can choose to upload content via either a web browser or from the command line interface. Because I wanted to automatically sync new content, I chose the latter. Setting it up isn’t too bad, but certainly takes more work and understanding of AWS than using the browser interface. Here are some things I configured:

  • Created an S3 bucket
  • Created a new IAM user with limited permissions (S3 bucket read & write, but not delete): This should theoretically prevent data from getting accidentally deleted
  • Created an access key: This is required to be able to interact with the S3 bucket via the CLI
  • Installed AWS CLI version 2 on my computer (on Windows, you can either use the Windows version or Linux version via WSL and they work equally well): Version 2 is required because version 1 does not support setting the “Deep Archive” storage class flag
  • Wrote a sync script to sync a folder of content to AWS S3: All data is going one way and is supposed to be an archive, so no data is being deleted by the script, though the “sync” function supports deletion in S3
  • Created a task in Task Scheduler to start a daily sync of my content: This can be set to sync data as frequently as desired

With this setup, I was able to upload my data every day automatically. Here’s what the very simple script looks like:

echo Started %date% %time% >> H:\Logs\log-aws-video-sync.txt

aws2 s3 sync H:\Data\ s3://bucket-name/data/ --storage-class DEEP_ARCHIVE --no-progress >> H:\Logs\log-aws-video-sync.txt

echo Ended %date% %time% >> H:\Logs\log-aws-video-sync.txt

Note the “storage-class” flag is set to “DEEP_ARCHIVE.” This is how you can specify that you want the data sent immediately to Deep Archive. If you don’t, it will sit in the standard S3 class where it will cost much more to store. Now, I can sync my data to AWS S3 Deep Archive.

Downsides

There are many downsides to this approach. Here are a few.

Upload Speeds

In the majority of the United States and many other parts of the world, upload speeds are terrible. In my previous house, just a city away, the maximum speed I could get was 15Mbps, which would make uploading 3TB of data take an unreasonable amount of time, especially when adding 30 to 90GB of data at a time going forward. Luckily, my new house has a fiber connection where I get 300 to 400Mbps upload speeds. For anyone without a fast, unlimited Internet connection, this is not an approach I would recommend.

Online Backups

If your computer is compromised with some sort of awful malware, your computer is technically able to connect directly to your S3 bucket, meaning whatever is in there could be compromised as well. While that may be very unlikely, it is something to consider. An external hard drive that is disconnected from your computer and network is a safer choice for very important data from that perspective. Both would be even better.

Retrieval Time

AWS Glacier Deep Archive data takes a while to retrieve, so don’t expect to have instant access to it. Depending on how much you pay, it will take anywhere from 12 to 48 hours before you’re able to download it. See this page for more details:

https://docs.aws.amazon.com/AmazonS3/latest/dev/restoring-objects.html#restoring-objects-retrieval-options

Costs

The monthly fee for data storage is very low compared to many cloud solutions and even more traditional solutions, such as external hard drives. However, there are more costs than simple storage costs. Perhaps I’m just bad at reading and interpreting things, but it’s hard to fully understand how AWS’s pricing breakdown works:

https://aws.amazon.com/s3/pricing/

I contacted their support staff and got a few answers though. The biggest things to watch out for are the following:

  • Retrieval costs
  • Requests

Retrieval costs come in a couple of forms. First, there is the cost of restoring your Glacier Deep Archive content to S3. This basically takes it out of storage and puts it in S3 for your retrieval. Second, there is the cost of downloading, and this is where they really get you. If you ever need to download it again, it’s $0.09/GB in the Ohio region (the cheapest one for me). That’s about $270 for 3TB of data! Take a look for yourself:

https://aws.amazon.com/s3/pricing/

This means Deep Archive is meant for just that–an archive. This is not meant to be something you ever touch unless you really, really need it.

More Costs

Beyond the costs I’ve already mentioned, there are more costs associated with AWS, to the point where your phone bill might be easier to understand. One such cost was the “PUT, COPY, POST, LIST requests (per 1,000 requests).” I think this comes into play when uploading content to AWS S3. Once I figure out more about the total costs associated with uploading to AWS Glacier Deep Archive I can write an update to this post.

Initial Thoughts

As long as you know something about how AWS works, AWS Glacier Deep Archive seems like a viable option to store data you don’t plan to ever need to recover again. If it’s just there as a safeguard against catastrophe, it might be a good option. However, I’m going to save my final judgment for after I get the costs sorted out. For the first 11 days this month, I’m already at $9, which is significantly higher than the $3-4 I was expecting.

Even though I uploaded to S3 with the “Deep Archive” storage class specified, I’m seeing the majority of my charges from S3 rather than Deep Archive.

If the costs continue this way, it’s probably going to be too expensive to justify. I’m going to wait on AWS support to explain what’s going on before I make that final call.

One thought on “Backing Up Data with AWS Glacier Deep Archive

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s