AWS DynamoDb backup recovery with CloudFormation - Not without fault

Publicerat:
June 12, 2020
April 23, 2021

Most developers using Amazon Web Services have at least heard of their fully managed NoSQL DynamoDb database. It offers excellent speed and scalability, and boast an impressive feature set.Those who are also using them are likely to create their DynamoDb Tables using CloudFormation, where the mantra of infrastructure as code comes to life.When running these beasts in production, anyone can conclude that even for a fully managed service, backups are probably a good thing. Even a managed database can be erased by a single mistake.

Our Goal

What we want to achieve is simple:

  • Restore a DynamoDb table with a specific name to it's previous state.
  • Have a happy In-Sync CloudFormation stack after we are done.

Should this maneuver be straightforward? Yes. Is it? You'll find out.

The Limitations

Fortunately, DynamoDb comes with both manual backups, on-delete backups and Point-In-Time-Recovery options.Unfortunately, when scratching the surface you find that the restored backups are not really identical to the original.All of the database items are there, but the Table is missing the following:

  • Point-In-Time-Recovery
  • Auto scaling policies
  • AWS IAM policies
  • CloudWatch metrics & alarms
  • Tags
  • Stream settings
  • Time to Live

You'll also discover that an existing DynamoDb Table can't be restored to it's previous state. Instead, you will always create a new separate Table instance from your backup. Furthermore, the Table instances can't be renamed, so you can't move the restored table after it's created.This is especially a problem for any naive CloudFormation setup which creates a Table with a specific name and has your other resources refer to the Table with the specific name.[caption id="attachment_953" align="aligncenter" width="451"]

A DynamoDb table created with an explicit name by CloudFormation, with Lambda resource refering to the Table name[/caption]

Backup alternatives

We can always restore a backup to a new Table with any new specific name. This means that any classic backups work great.For the Point in Time recovery, you don't really have backups. You use your original Table to create a new fully restored Table from an earlier point in time. The workaround for making a backup is easy enough - you make a backup of the backup.[caption id="attachment_955" align="aligncenter" width="531"]

DynamoDb offers a few backup options. All will be missing much configuration[/caption]What you must be aware of is that any of these backups are missing all the extra configuration such as Tags and Alarms. If you want to completely restore everything, then you need to carefully document all this configuration manually so you that can re-configure it yourself.

Restoring the backup, aka. the Ugly Part

Now we get to replace the DynamoDb table that is managed by CloudFormation. Since there is no specific functionality for this in AWS, and restoring a backup does not conform well with a patch to Infrastructure as Code, this gets ugly.You begin with:

  • A happy In-Sync CloudFormation stack with a faulty Table
  • A backup to restore for said Table

In order to restore a table with the name Table-A, we must first manually delete Table-A. In other words, we deliberately make a disastrous change to the CloudFormation stack that it's not aware of, making it Drift. This is the only way to free the name Table-A that all your resources point to.Warning: This will also completely erase:

  • The original configuration that the backup doesn't restore
  • Any Point-In-Time-Recovery of the Table

After this is done, you easily restore the backup Table to the name of Table-A. Documentation claims it takes an hour or two, in my case I was lucky and had it running in 9 minutes.[caption id="attachment_957" align="aligncenter" width="512"]

A Delete followed by a quick restore will get your data restored to the same Table name[/caption]

Fixing the Drift

As you recall, the backups don't restore all the extra "stuff" that's not actual contents. CloudFormation cares about this, and notices that the table it's managing has a different configuration than what CloudFormation created, and keeps complaining about the Drift we introduced.To remedy this issue, all we can resort to is the manual re-configuration of all the settings of the Table. If done correctly, you will find that your CloudFormation stack returns to being In-Sync. After this the System lives happily until the next inevitable mistake is made.

What we did

AWS has poor support for restoring an existing DynamoDb Table to an earlier state. All recovery options will leave you with a misconfigured copy at a new location.By manually deleting the Table, we freed the resource managed by CloudFormation. By doing this, we could restore a backup to our wanted location so that our resources can use the backup without needing support for dynamically changing which Table it uses.This approach is as stated not without issues. You have introduced downtime if your service wasn't already down, and you delete all historical Point-In-Time recovery options.The reason you would want this approach is that you didn't implement support for a database hot-swap in your CloudFormation stacks and your Lambda resources. If that describes your service, then this workaround is your best option.

Skriven av:
Devies