Skip to content
Snippets Groups Projects
README.md 6.02 KiB
Newer Older
Adam Lewenberg's avatar
Adam Lewenberg committed
[[_TOC_]]

# Helm Chart for GCP Cloud SQL Archive

Adam Lewenberg's avatar
Adam Lewenberg committed
## Overview
Adam Lewenberg's avatar
Adam Lewenberg committed

This chart archives specified GCP Cloud SQL databases into a GCP Cloud
Adam Lewenberg's avatar
Adam Lewenberg committed
Storage bucket.

Adam Lewenberg's avatar
Adam Lewenberg committed
GCP Cloud SQL has facilities for backups but these serve principally for
disaster recovery. This chart is intended to be used for archiving. The
idea is that you use this chart to backup up your databases every so often
to a Cloud Storage bucket with versioning enabled. For example, you might
back up once a month with your bucket configured to delete a version only
if the version is at least a year old and there are at least 12 other
versions. This way you have monthly archives going back a year. See also
[Cloud Storage "Object Lifecycle Management"][7].

This Helm chart is a thin wrapper around the [`gcloud` SQL export
Adam Lewenberg's avatar
Adam Lewenberg committed
command][1] `gcloud sql export sql`. The wrapper has the advantage that it
can dump multiple databases into individual files whereas calling `gcloud
sql export sql` directly puts all the databases into a single file.

The chart consists of a single Kubernetes CronJob. The assumption is that
the CronJob will run in the context of [Workload Identity][2] which gives
the CronJob the access it needs to both read the database and write to the
Adam Lewenberg's avatar
Adam Lewenberg committed
Cloud Storage bucket. It is up to the user of this chart to ensure that
the Kubernetes Service Account (specified by the
`kubernetesServiceAccountName` configuration directive) has been set up
with Workload Identity and links to a GCP service account with the
necessary permissions; see [section GCP service account
permissions](#gcp-service-account-permissions) below for more detail on
what permissions are required.
Adam Lewenberg's avatar
Adam Lewenberg committed

The archiving wrapper script is copied via a `git clone` into the Pod
during initialization, so this repository must remain *publicly*
Adam Lewenberg's avatar
Adam Lewenberg committed
viewable.

Note: The Cloud SQL database and the destination bucket **must** be in the same
Adam Lewenberg's avatar
Adam Lewenberg committed
GCP project.

## Example of use

To dump the database `addresses` from the Cloud SQL instance `mysql-1` in
Adam Lewenberg's avatar
Adam Lewenberg committed
the GCP project `gcp-proj-id` to the bucket `sql-dump1`, set these values
in your `values.yaml` file:

```
backup:
  project_id: gcp-proj-id
  instance:   mysql-1
  bucket:     gs://sql-dump1
  databases:
    - addresses
```

When the CronJob is done running the archive file will be in
`gs://<bucket>/<instance>--<database>` which, in this case, is
`gs://sql-dump1/mysql-1--addresses.gz`.
Adam Lewenberg's avatar
Adam Lewenberg committed

## Design

The chart is a single CronJob. The CronJob has an InitContainer that
copies the Python 3 script
[`gcp-cloudsql-archive.py`](gcp-cloudsql-archive.py) from this Git
repository to the `/root` directory. The main container is Google's
[`cloud-sdk` image][3] which contains the [`gcloud` executable][4]. The main container runs
`gcp-cloudsql-archive.py` with parameters and options from
[`values.yaml`](common/helm/values.yaml) (see "Configuration" below).
The program `gcp-cloudsql-archive.py` in turn calls `gcloud`.
Adam Lewenberg's avatar
Adam Lewenberg committed

See also the [man page for
`gcp-cloudsql-archive.py`](gcp-cloudsql-archive.md).

Adam Lewenberg's avatar
Adam Lewenberg committed
## Configuration

All configuration is made in
[`values.yaml`](common/helm/values.yaml). Values are required unless
marked otherwise.

* `namespace`: The Kubernetes namespace

* `kubernetesServiceAccountName`: This is the name of the Kubernetes
service account to run the cronjob under. It needs to be bound to a GCP
service account using Workload Identity. The GCP service account must
Adam Lewenberg's avatar
Adam Lewenberg committed
permission to dump the SQL data and write to the bucket; see [GCP service
account permissions](#gcp-service-account-permissions) below for more
details on the permissions needed.
Adam Lewenberg's avatar
Adam Lewenberg committed

* `backup:project_id`: the GCP Project id where the Cloud SQL and bucket reside.

* `backup:instance`: the name of the Cloud SQL instance.

* `backup:bucket`: The destination bucker. Should be of the form `gs://mybucket`.

* `backup:databases`: A list of databases to back up. Each
will be backed up to a separate object in the bucket. Example:

        backup:
          databases:
            - abc
            - def
            - xyz

* `backup:verbosity`: (optional) How verbose we want the `gcloud sql export sql`
command to be. See the [`gcloud sql export` documentation][1] for the allowed values.
Default: `info`.

* `backup:verbose`: (optional) Have the wrapper script be more verbose. Note that this
is different from `backup:verbosity`.
Default: `false`.

Adam Lewenberg's avatar
Adam Lewenberg committed
* `backup:dry_run`: (optional) Run in "dry-run" mode, that is, show what
will happen but do not actually run the backups. Default: `false`.

* `crontab`: a standard five-position [crontab time string][5].
Default: "30 6 1 * *" (run on the first of the month at 06:30).
Adam Lewenberg's avatar
Adam Lewenberg committed

* `sleepmode`: (optional) Run the CronJob but run `sleep infinity` rather than the gcloud backup.
This is useful for debugging.
Default: `false`.

## Important Notes

* Trying to export a non-existent database will cause the application to
hang. So, be careful when configuring.

* The Cloud SQL database and the destination bucket **must** be in the same
GCP project.

* This archive must be _publicly_ readable so that the InitContainer can copy the
script [`gcp-cloudsql-archive.py`](gcp-cloudsql-archive.py) to the main Pod.

Adam Lewenberg's avatar
Adam Lewenberg committed
## GCP service account permissions

The assumption is that users of this Helm chart will link the CronJob's
Kubernetes service account to a GCP service account using [Workload
Adam Lewenberg's avatar
Adam Lewenberg committed
Identity][2]. The GCP service account must have the permissions to be able read
Adam Lewenberg's avatar
Adam Lewenberg committed
the data from the Cloud SQL instance and write to the Cloud Storage
bucket. The permissions needed are described on the ["Exporting and
importing using SQL dump files" page][1].

You will also need to grant the service account associated with the Cloud
SQL instance the right to create storage objects. For more details on this
as well as Terraform code to enable these permissions see [this
StackOverflow answer][6].
Adam Lewenberg's avatar
Adam Lewenberg committed

[1]: https://cloud.google.com/sdk/gcloud/reference/sql/export/sql

[2]: https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity

[3]: https://gcr.io/google.com/cloudsdktool/cloud-sdk

[4]: https://cloud.google.com/sdk/gcloud
Adam Lewenberg's avatar
Adam Lewenberg committed

[5]: https://en.wikipedia.org/wiki/Cron

[6]: https://stackoverflow.com/a/69561246/268847

Adam Lewenberg's avatar
Adam Lewenberg committed
[7]: https://cloud.google.com/storage/docs/lifecycle