GCP – How to automatically export backups (MySQL and instances) to other places (AWS, Offline, etc…)

GCP has a pretty good backup system which is really easy to configure.
For instances, you can schedule automatic snapshots and also convert theses snapshots to images whenever you want.
For MySQL databases (not sure about the other types) you can schedule backups which are stored in a neat “full backup + incremental backups” to save space and you can also turn on binary-logging for a full timeline of your database.

However, these measures do not protect against the deletion of the instance or the database itself. Deleting a database also destroys all of its backups so it’s gone forever. Deleting an instance keeps the snapshots, but a malicious person could simply delete them afterwards.

Also, the famous “3-2-1 rule” says that we should at least have 3 backups on 2 different media, and one of them off-line.

With this in mind, i started searching for solutions to export these backups to safer places in case of someone going psycho on a GCP account/project with critical data. Unfortunately there isn’t anything native to GCP that makes this easy, but piecing a few things together also works.

Exporting MySQL Backups

Just follow this article:

https://medium.com/@kennethteh90/how-to-schedule-daily-cloud-sql-export-to-google-cloud-storage-4c1bd360af06

There are a few caveats, however:

  • I decided to store the backups in a completely separate project. It’s fine, but you have to make sure that the database’s service account is allowed to write in the bucket you created.
  • You should apply a bucket-lock to the bucket. This ensures that the backups won’t be deleted before x days.
  • Compare the amount of days that you want in backup to the minimum retention period of GCP’s different storage tiers. E.g: more than 30 days of backups? Use nearline
  • You should also apply a lifecycle rule to automatically delete backups older than x days.
  • While the backup operation is running you won’t be able to mess with your DB instance’s configurations, so make sure to schedule it when it’s least used.
  • The article tells you to use an HTTP trigger for the backup function. I tried this but it didn’t work due to permission reasons (???). I recommend you to use a pub/sub trigger instead, which is even more secure.

I also wanted to send the backups to an AWS bucket, which was completely separated from the GCP account. You can do this using the following steps:

1 – Create an instance on GCP to run the commands. It can be very small.
2 – Create a bucket on AWS to receive the backups
3 – Create a service account on AWS that is allowed to write to the bucket
4 – Create a service account on GCP that is allowed to read the backups
5 – Log into both accounts on the instance using the “gcloud init” and “aws configure” commands
6 – Schedule the following command on the machine using “crontab -e“:
gsutil -m rsync -rd gs://<YOUR GCP BACKUPS BUCKET> s3://<YOUR AWS BUCKET>

NOTE: This won’t work if the backup file is larger than 5gb. This is because AWS requires multipart upload for files larger than 5gb and gsutil doesn’t support it.

If you want to export the backups to an offline location, just do the same steps on the machine that will receive the backups and change the destination from s3://<something> to a directory on the machine.

Exporting Instance Backups

This one was a little harder. The following solution assumes that you want to store the backups in a different project than where the instances are located.

Here is what you need to do:

  • Create a snapshot schedule for the instance that you want to back-up. A daily one is fine.
  • Create two buckets on your secondary project: One to receive temporary backups with a life-cycle rule to delete everything older than 1 day and another one with bucket-lock activated that will actually store the backups (explained later) for x days.
  • Create an instance to run a script (below). It can be the same you used for exporting MySQL backups.
  • Ensure that the service account you configured on the instance via “gcloud init” has write permissions to the 2 buckets you just created
  • Ensure that the same service account has permissions to read/list snapshots and images on the main account.

After that, you just need to schedule the following script to run AFTER the daily snapshot:

#!/bin/bash

#Find newest snapshot
echo "Getting newest snapshot..."
newestSnapshot=$(gcloud compute snapshots list --project=<your project> --filter=<disk name> --sort-by=~NAME --limit=1 | tail -n 1 | grep -o '^\S*')
echo "Newest snapshot found: "
echo $newestSnapshot

#Create image from snapshot
echo "Converting snapshot to image..."
gcloud compute images create $newestSnapshot --source-snapshot=$newestSnapshot --project=<your main project>
echo "Image created (i guess)"

#Export image to bucket 
echo "Exporting image to bucket"
gcloud compute images export --destination-uri gs://<bucket that will receive image>/$newestSnapshot.tar.gz --image $newestSnapshot --image-project <project where the image is stored>
echo "Export completed (i guess)"

#Delete image
echo "Deleting image on main account"
gcloud compute images delete $newestSnapshot --project <project where the image is stored> --quiet
echo "Deleted (i guess)"

#Move full backup to the correct bucket
gsutil cp gs://<bucket that will receive image>/$newestSnapshot.tar.gz gs://<bucket with bucket-lock>/
echo "All done!"

What this script does is find the newest snapshot, convert it to an image, export the image to the first (temporary) bucket as a full backup, delete the created image, and then finally copy the image to the correct bucket (the one with bucket-lock).

The reason why we need two buckets is that if the bucket that will receive the converted “image to backup” file has bucket-lock active, the proccess fails because of reasons. So we need to send it to a normal bucket first and then move it to a bucket-locked bucket later. The lifecycle rule on the “temporary” bucket ensures that nothing is going to stay there for more than 1 day, so you don’t waste money on duplicated backups.

You can also use the commands discussed on the previous topic to copy the backup to AWS (if it’s smaller than 5GB) or to an offline location.

NOTE: The script posted above is crappy and doesn’t check the outputs of the commands. If something goes wrong in the middle, too bad. Try executing each command individually to make sure your permissions are correct.

Leave a Reply