Snippets Groups Projects

We'll be upgrading our GitLab instance with critical patches on Friday, March 14, 2025, from 9:00 PM to 9:15 PM. There will be short service interruptions during the upgrade while services are restarted.

change formatting of instance/database

Adam Lewenberg authored 1 year ago

a561b940

a561b940 1 year ago

Name	Last commit	Last update
common
prod
.gitignore
README.md
gcp-cloudsql-archive.md
gcp-cloudsql-archive.py
otica.yaml

Helm Chart for GCP Cloud SQL Archive

Helm Chart for GCP Cloud SQL Archive

Overview

This chart archives specified GCP Cloud SQL databases into a GCP Cloud Storage bucket.

GCP Cloud SQL has facilities for backups but these serve principally for disaster recovery. This chart is intended to be used for archiving. The idea is that you use this chart to backup up your databases every so often to a Cloud Storage bucket with versioning enabled. For example, you might back up once a month with your bucket configured to delete a version only if the version is at least a year old and there are at least 12 other versions. This way you have monthly archives going back a year. See also Cloud Storage "Object Lifecycle Management".

This Helm chart is a thin wrapper around the gcloud SQL export command gcloud sql export sql. The wrapper has the advantage that it can dump multiple databases into individual files whereas calling gcloud sql export sql directly puts all the databases into a single file.

The chart consists of a single Kubernetes CronJob. The assumption is that the CronJob will run in the context of Workload Identity which gives the CronJob the access it needs to both read the database and write to the Cloud Storage bucket. It is up to the user of this chart to ensure that the Kubernetes Service Account (specified by the kubernetesServiceAccountName configuration directive) has been set up with Workload Identity and links to a GCP service account with the necessary permissions; see section GCP service account permissions below for more detail on what permissions are required.

The archiving wrapper script is copied via a git clone into the Pod during initialization, so this repository must remain publicly viewable.

Note: The Cloud SQL database and the destination bucket must be in the same GCP project.

Example of use

To dump the database addresses from the Cloud SQL instance mysql-1 in the GCP project gcp-proj-id to the bucket sql-dump1, set these values in your values.yaml file:

backup:
  project_id: gcp-proj-id
  instance:   mysql-1
  bucket:     gs://sql-dump1
  databases:
    - addresses

When the CronJob is done running the archive file will be in gs://<bucket>/<instance>--<database> which, in this case, is gs://sql-dump1/mysql-1--addresses.gz.

Design

The chart is a single CronJob. The CronJob has an InitContainer that copies the Python 3 script gcp-cloudsql-archive.py from this Git repository to the /root directory. The main container is Google's cloud-sdk image which contains the gcloud executable. The main container runs gcp-cloudsql-archive.py with parameters and options from values.yaml (see "Configuration" below). The program gcp-cloudsql-archive.py in turn calls gcloud.

See also the man page for gcp-cloudsql-archive.py.

Configuration

All configuration is made in values.yaml. Values are required unless marked otherwise.

namespace: The Kubernetes namespace
kubernetesServiceAccountName: This is the name of the Kubernetes service account to run the cronjob under. It needs to be bound to a GCP service account using Workload Identity. The GCP service account must permission to dump the SQL data and write to the bucket; see GCP service account permissions below for more details on the permissions needed.
backup:project_id: the GCP Project id where the Cloud SQL and bucket reside.
backup:instance: the name of the Cloud SQL instance.
backup:bucket: The destination bucker. Should be of the form gs://mybucket.
backup:databases: A list of databases to back up. Each will be backed up to a separate object in the bucket. Example:
```
  backup:
    databases:
      - abc
      - def
      - xyz
```
backup:verbosity: (optional) How verbose we want the gcloud sql export sql command to be. See the gcloud sql export documentation for the allowed values. Default: info.
backup:verbose: (optional) Have the wrapper script be more verbose. Note that this is different from backup:verbosity. Default: false.
backup:dry_run: (optional) Run in "dry-run" mode, that is, show what will happen but do not actually run the backups. Default: false.
crontab: a standard five-position crontab time string. Default: "30 6 1 * *" (run on the first of the month at 06:30).
sleepmode: (optional) Run the CronJob but run sleep infinity rather than the gcloud backup. This is useful for debugging. Default: false.

Important Notes

Trying to export a non-existent database will cause the application to hang. So, be careful when configuring.
The Cloud SQL database and the destination bucket must be in the same GCP project.
This archive must be publicly readable so that the InitContainer can copy the script gcp-cloudsql-archive.py to the main Pod.

GCP service account permissions

The assumption is that users of this Helm chart will link the CronJob's Kubernetes service account to a GCP service account using Workload Identity. The GCP service account must have the permissions to be able read the data from the Cloud SQL instance and write to the Cloud Storage bucket. The permissions needed are described on the "Exporting and importing using SQL dump files" page.

You will also need to grant the service account associated with the Cloud SQL instance the right to create storage objects. For more details on this as well as Terraform code to enable these permissions see this StackOverflow answer.