Helm Chart for GCP Cloud SQL Archive
Overview
This chart archives specified GCP Cloud SQL databases into a GCP Cloud Storage bucket.
GCP Cloud SQL has facilities for backups but these serve principally for disaster recovery. This chart is intended to be used for archiving. The idea is that you use this chart to backup up your databases every so often to a Cloud Storage bucket with versioning enabled. For example, you might back up once a month with your bucket configured to delete a version only if the version is at least a year old and there are at least 12 other versions. This way you have monthly archives going back a year. See also Cloud Storage "Object Lifecycle Management".
This Helm chart is a thin wrapper around the gcloud
SQL export
command gcloud sql export sql
. The wrapper has the advantage that it
can dump multiple databases into individual files whereas calling gcloud sql export sql
directly puts all the databases into a single file.
The chart consists of a single Kubernetes CronJob. The assumption is that
the CronJob will run in the context of Workload Identity which gives
the CronJob the access it needs to both read the database and write to the
Cloud Storage bucket. It is up to the user of this chart to ensure that
the Kubernetes Service Account (specified by the
kubernetesServiceAccountName
configuration directive) has been set up
with Workload Identity and links to a GCP service account with the
necessary permissions; see section GCP service account
permissions below for more detail on
what permissions are required.
The archiving wrapper script is copied via a git clone
into the Pod
during initialization, so this repository must remain publicly
viewable.
Note: The Cloud SQL database and the destination bucket must be in the same GCP project.
Example of use
To dump the database addresses
from the Cloud SQL instance mysql-1
in
the GCP project gcp-proj-id
to the bucket sql-dump1
, set these values
in your values.yaml
file:
backup:
project_id: gcp-proj-id
instance: mysql-1
bucket: gs://sql-dump1
databases:
- addresses
When the CronJob is done running the archive file will be in
gs://<bucket>/<instance>--<database>
which, in this case, is
gs://sql-dump1/mysql-1--addresses.gz
.
Design
The chart is a single CronJob. The CronJob has an InitContainer that
copies the Python 3 script
gcp-cloudsql-archive.py
from this Git
repository to the /root
directory. The main container is Google's
cloud-sdk
image which contains the gcloud
executable. The main container runs
gcp-cloudsql-archive.py
with parameters and options from
values.yaml
(see "Configuration" below).
The program gcp-cloudsql-archive.py
in turn calls gcloud
.
See also the man page for
gcp-cloudsql-archive.py
.
Configuration
All configuration is made in
values.yaml
. Values are required unless
marked otherwise.
-
namespace
: The Kubernetes namespace -
kubernetesServiceAccountName
: This is the name of the Kubernetes service account to run the cronjob under. It needs to be bound to a GCP service account using Workload Identity. The GCP service account must permission to dump the SQL data and write to the bucket; see GCP service account permissions below for more details on the permissions needed. -
backup:project_id
: the GCP Project id where the Cloud SQL and bucket reside. -
backup:instance
: the name of the Cloud SQL instance. -
backup:bucket
: The destination bucker. Should be of the formgs://mybucket
. -
backup:databases
: A list of databases to back up. Each will be backed up to a separate object in the bucket. Example:backup: databases: - abc - def - xyz
-
backup:verbosity
: (optional) How verbose we want thegcloud sql export sql
command to be. See thegcloud sql export
documentation for the allowed values. Default:info
. -
backup:verbose
: (optional) Have the wrapper script be more verbose. Note that this is different frombackup:verbosity
. Default:false
. -
backup:dry_run
: (optional) Run in "dry-run" mode, that is, show what will happen but do not actually run the backups. Default:false
. -
crontab
: a standard five-position crontab time string. Default: "30 6 1 * *" (run on the first of the month at 06:30). -
sleepmode
: (optional) Run the CronJob but runsleep infinity
rather than the gcloud backup. This is useful for debugging. Default:false
.
Important Notes
-
Trying to export a non-existent database will cause the application to hang. So, be careful when configuring.
-
The Cloud SQL database and the destination bucket must be in the same GCP project.
-
This archive must be publicly readable so that the InitContainer can copy the script
gcp-cloudsql-archive.py
to the main Pod.
GCP service account permissions
The assumption is that users of this Helm chart will link the CronJob's Kubernetes service account to a GCP service account using Workload Identity. The GCP service account must have the permissions to be able read the data from the Cloud SQL instance and write to the Cloud Storage bucket. The permissions needed are described on the "Exporting and importing using SQL dump files" page.
You will also need to grant the service account associated with the Cloud SQL instance the right to create storage objects. For more details on this as well as Terraform code to enable these permissions see this StackOverflow answer.