Archiving Amazon S3 Data to Amazon Glacier

AWS provides you with a number of data storage options. I’ve been using S3 with Arq for persona backups, and with AWS CLI tools all my servers. I have a rotation policy to keep 7 days, then a weekly backup for 4 weeks, and a monthly backup for 3 months … all working perfectly, but in reality I rarely use any of these old backups … so why not archiving them? It’s 10x cheaper to do so!

STANDARD - 99.999999999% durability. S3’s default storage option, starting at 0.125perGBRRS−99.990.125perGBRRS−99.990.093 per GB
GLACIER - 99.999999999% durability, object archived in Glacier option, starting at $0.010 per GB

Today I would like to focus on Amazon S3 and Amazon Glacier and a new and powerful way for you to use both of them together.

Both of the services offer dependable and highly durable storage for the Internet. Amazon S3 was designed for rapid retrieval. Glacier, in contrast, trades off retrieval time for cost, providing storage for as little at $0.01 per Gigabyte per month while retrieving data within three to five hours.

How would you like to have the best of both worlds? How about rapid retrieval of fresh data stored in S3, with automatic, policy-driven archiving to lower cost Glacier storage as your data ages, along with easy, API-driven or console-powered retrieval?

Sound good? Awesome, because that’s what we have! You can now use Amazon Glacier as a storage option for Amazon S3.

There are four aspects to this feature – storage, archiving, listing, and retrieval. Let’s look at each one in turn.

Storage
First, you need to tell S3 which objects are to be archived to the new Glacier storage option, and under what conditions. You do this by setting up a lifecycle rule using the following elements:

  • A regular expression to specify which objects in the bucket are subject to the policy.
  • A relative or absolute time specifier and a time period for transitioning objects to Glacier. The time periods are interpreted with respect to the object’s creation date. They can be relative (migrate items that are older than a certain number of days) or absolute (migrate items on a specific date)
  • A time period for expiring objects from Glacier.
    You can create a lifecycle rule in the AWS Management Console:
     

  • Archiving
    Every day, S3 will evaluate the lifecycle policies for each of your buckets and will archive objects in Glacier as appropriate. After the object has been successfully archived using the Glacier storage option, the object’s data will be removed from S3 but its index entry will remain as-is. The S3 storage class of an object that has been archived in Glacier will be set to GLACIER.

ListingAs with Amazon S3’s other storage options, all S3 objects that are stored using the Glacier option have an associated user-defined name. You can get a real-time list of all of your S3 object names, including those stored using the Glacier option, by using S3’s LIST API. If you list a bucket that contains objects that have been archived in Glacier, what will you see?

If you archive objects using the Glacier storage option, you must inspect the storage class of an object before you attempt to retrieve it. The customary GET request will work as expected if the object is stored in S3 Standard or Reduced Redundancy (RRS) storage. It will fail (with a 403 error) if the object is archived in Glacier. In this case, you must use the RESTORE operation (described below) to make your data available in S3.

RetrievalYou use S3’s new RESTORE operation to access an object archived in Glacier. As part of the request, you need to specify a retention period in days. Restoring an object will generally take 3 to 5 hours. Your restored object will remain in both Glacier and S3’s Reduced Redundancy Storage (RRS) for the duration of the retention period. At the end of the retention period the object’s data will be removed from S3; the object will remain in Glacier.

Although the objects are archived in Glacier, you can’t get to them via the Glacier APIs. Objects stored directly in Amazon Glacier using the Amazon Glacier API cannot be listed in real-time, and have a system-generated identifier rather than a user-defined name. Because Amazon S3 maintains the mapping between your user-defined object name and the Amazon Glacier system-defined identifier, Amazon S3 objects that are stored using the Amazon Glacier option are only accessible through the Amazon S3 API or the Amazon S3 Management Console