Last month, I wrote about using AWS Glacier Deep Archive for archiving data I didn’t expect to need to ever recover but wanted stored in a safe place just in case. At the end of the article I mentioned that I was already overrunning my expected costs, which should have been something like $3.50 per month for 3.5TB of data.
Within a few weeks my costs for the month were reaching around $20. Considering I specifically clarified the cost structure with AWS support staff before I even started this project and they confirmed that my estimates were accurate, this did not make me happy.
I contacted them again to try to determine what was costing so much money, since their Billing Management Console is often quite hard to decipher. Getting no satisfactory response, I did some more digging. After a few days and several hours of work, I finally found the answer: partial uploads that were being stored in S3 but were invisible in the web portal!
Viewing S3 Partial Multipart Uploads
Using AWS s3api and running the following command, you can see the partial multipart uploads:
aws2 s3api list-multipart-uploads --bucket your-bucket-name
Because my upload script ran on a Windows scheduled task that ran during the night but was automatically stopped during the day to avoid saturating my network, several large files failed to complete the upload process before the process was killed, leaving them in this strange limbo where they were being stored on S3 but not visible by either a standard listing within the AWS CLI or from the web interface. The only way to see them was by using “s3api,” which I hadn’t been using at all up until this point.
Removing S3 Partial Multipart Uploads
Now the question was how to remove them. After a lot of digging, it appeared the only way to remove these partial multipart uploads I was paying for was by adding a lifecycle rule to the bucket. The rule would tell S3 to remove all partial multipart uploads after one day.
After putting this rule and place and running the above s3api command listed above, the partial multipart uploads were cleaned up after a couple of days. Luckily AWS support refunded the costs for these expensive multipart uploads I was unknowingly storing since they weren’t able to resolve my problem.
As expected from the outset, there were unexpected costs to using AWS S3 Glacier Deep Archive, and this is probably the number one reason to avoid the service if cost is of great concern to you like it is to me. I don’t think there’s been a single AWS service that I’ve used that hasn’t cost more than I first expected because it’s very hard to estimate all of the services you’ll end up using by the end of your project. As you can see in the following bill, even though the costs are much more reasonable, I didn’t know the “UploadPart requests in US East (Ohio)” were going to cost $2.49 because who can predict how many requests are going to be made during the upload process? I uploaded about 200 large files total, yet that resulted in 498,884 requests…okay, whatever, Amazon.
The second reason to avoid AWS S3 Glacier Deep Archive is the cost of recovery. If I ever need to recover my data from the service, I shudder to think how large that bill will be by the end.
Despite some downsides, Deep Archive has been working very well for me. My script pushes new data to the service each night, and cost of storage has leveled out to what I originally expected since discovering the partial multipart upload issue.
However, if you want a super-simple solution with unlimited storage and a fixed price (currently $6/month) including retrieval costs, you might want to check out Backblaze. I recently tried that and I’ll share my experiences with it soon.