README.md 5.95 KB
Newer Older
Adam Lewenberg's avatar
Adam Lewenberg committed
1
2
3
4
[[_TOC_]]

# Helm Chart for GCP Cloud SQL Archive

Adam Lewenberg's avatar
Adam Lewenberg committed
5
## Overview
Adam Lewenberg's avatar
Adam Lewenberg committed
6

Adam Lewenberg's avatar
Adam Lewenberg committed
7
This chart archives specified GCP Cloud SQL databases into a GCP Cloud
Adam Lewenberg's avatar
Adam Lewenberg committed
8
9
Storage bucket.

Adam Lewenberg's avatar
Adam Lewenberg committed
10
11
12
13
14
15
16
17
18
GCP Cloud SQL has facilities for backups but these serve principally for
disaster recovery. This chart is intended to be used for archiving. The
idea is that you use this chart to backup up your databases every so often
to a Cloud Storage bucket with versioning enabled. For example, you might
back up once a month with your bucket configured to delete a version only
if the version is at least a year old and there are at least 12 other
versions. This way you have monthly archives going back a year. See also
[Cloud Storage "Object Lifecycle Management"][7].

Adam Lewenberg's avatar
Adam Lewenberg committed
19
This Helm chart is a thin wrapper around the [`gcloud` SQL export
Adam Lewenberg's avatar
Adam Lewenberg committed
20
21
22
23
command][1] `gcloud sql export sql`. The wrapper has the advantage that it
can dump multiple databases into individual files whereas calling `gcloud
sql export sql` directly puts all the databases into a single file.

Adam Lewenberg's avatar
Adam Lewenberg committed
24
25
26
The chart consists of a single Kubernetes CronJob. The assumption is that
the CronJob will run in the context of [Workload Identity][2] which gives
the CronJob the access it needs to both read the database and write to the
Adam Lewenberg's avatar
Adam Lewenberg committed
27
28
29
30
Cloud Storage bucket. It is up to the user of this chart to ensure that
the Kubernetes Service Account (specified by the
`kubernetesServiceAccountName` configuration directive) has been set up
with Workload Identity and links to a GCP service account with the
Adam Lewenberg's avatar
Adam Lewenberg committed
31
32
33
necessary permissions; see [section GCP service account
permissions](#gcp-service-account-permissions) below for more detail on
what permissions are required.
Adam Lewenberg's avatar
Adam Lewenberg committed
34
35

The archiving wrapper script is copied via a `git clone` into the Pod
Adam Lewenberg's avatar
Adam Lewenberg committed
36
during initialization, so this repository must remain *publicly*
Adam Lewenberg's avatar
Adam Lewenberg committed
37
38
viewable.

Adam Lewenberg's avatar
Adam Lewenberg committed
39
Note: The Cloud SQL database and the destination bucket **must** be in the same
Adam Lewenberg's avatar
Adam Lewenberg committed
40
41
42
43
GCP project.

## Example of use

Adam Lewenberg's avatar
Adam Lewenberg committed
44
To dump the database `addresses` from the Cloud SQL instance `mysql-1` in
Adam Lewenberg's avatar
Adam Lewenberg committed
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
the GCP project `gcp-proj-id` to the bucket `sql-dump1`, set these values
in your `values.yaml` file:

```
backup:
  project_id: gcp-proj-id
  instance:   mysql-1
  bucket:     gs://sql-dump1
  databases:
    - addresses
```

When the CronJob is done running the archive file will be in
`gs://sql-dump1/addresses.gz`.

## Design

The chart is a single CronJob. The CronJob has an InitContainer that
copies the Python 3 script
Adam Lewenberg's avatar
Adam Lewenberg committed
64
65
66
67
68
69
[`gcp-cloudsql-archive.py`](gcp-cloudsql-archive.py) from this Git
repository to the `/root` directory. The main container is Google's
[`cloud-sdk` image][3] which contains the [`gcloud` executable][4]. The main container runs
`gcp-cloudsql-archive.py` with parameters and options from
[`values.yaml`](common/helm/values.yaml) (see "Configuration" below).
The program `gcp-cloudsql-archive.py` in turn calls `gcloud`.
Adam Lewenberg's avatar
Adam Lewenberg committed
70

71
72
73
See also the [man page for
`gcp-cloudsql-archive.py`](gcp-cloudsql-archive.md).

Adam Lewenberg's avatar
Adam Lewenberg committed
74
75
76
77
78
79
80
81
82
83
84
## Configuration

All configuration is made in
[`values.yaml`](common/helm/values.yaml). Values are required unless
marked otherwise.

* `namespace`: The Kubernetes namespace

* `kubernetesServiceAccountName`: This is the name of the Kubernetes
service account to run the cronjob under. It needs to be bound to a GCP
service account using Workload Identity. The GCP service account must
Adam Lewenberg's avatar
Adam Lewenberg committed
85
86
87
permission to dump the SQL data and write to the bucket; see [GCP service
account permissions](#gcp-service-account-permissions) below for more
details on the permissions needed.
Adam Lewenberg's avatar
Adam Lewenberg committed
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111

* `backup:project_id`: the GCP Project id where the Cloud SQL and bucket reside.

* `backup:instance`: the name of the Cloud SQL instance.

* `backup:bucket`: The destination bucker. Should be of the form `gs://mybucket`.

* `backup:databases`: A list of databases to back up. Each
will be backed up to a separate object in the bucket. Example:

        backup:
          databases:
            - abc
            - def
            - xyz

* `backup:verbosity`: (optional) How verbose we want the `gcloud sql export sql`
command to be. See the [`gcloud sql export` documentation][1] for the allowed values.
Default: `info`.

* `backup:verbose`: (optional) Have the wrapper script be more verbose. Note that this
is different from `backup:verbosity`.
Default: `false`.

Adam Lewenberg's avatar
Adam Lewenberg committed
112
113
114
115
116
* `backup:dry_run`: (optional) Run in "dry-run" mode, that is, show what
will happen but do not actually run the backups. Default: `false`.

* `crontab`: a standard five-position [crontab time string][5].
Default: "30 6 1 * *" (run on the first of the month at 06:30).
Adam Lewenberg's avatar
Adam Lewenberg committed
117
118
119
120
121
122
123
124
125
126

* `sleepmode`: (optional) Run the CronJob but run `sleep infinity` rather than the gcloud backup.
This is useful for debugging.
Default: `false`.

## Important Notes

* Trying to export a non-existent database will cause the application to
hang. So, be careful when configuring.

Adam Lewenberg's avatar
Adam Lewenberg committed
127
128
129
130
131
132
* The Cloud SQL database and the destination bucket **must** be in the same
GCP project.

* This archive must be _publicly_ readable so that the InitContainer can copy the
script [`gcp-cloudsql-archive.py`](gcp-cloudsql-archive.py) to the main Pod.

Adam Lewenberg's avatar
Adam Lewenberg committed
133
134
135
136
## GCP service account permissions

The assumption is that users of this Helm chart will link the CronJob's
Kubernetes service account to a GCP service account using [Workload
Adam Lewenberg's avatar
Adam Lewenberg committed
137
Identity][2]. The GCP service account must have the permissions to be able read
Adam Lewenberg's avatar
Adam Lewenberg committed
138
139
140
141
the data from the Cloud SQL instance and write to the Cloud Storage
bucket. The permissions needed are described on the ["Exporting and
importing using SQL dump files" page][1].

142
143
144
145
You will also need to grant the service account associated with the Cloud
SQL instance the right to create storage objects. For more details on this
as well as Terraform code to enable these permissions see [this
StackOverflow answer][6].
Adam Lewenberg's avatar
Adam Lewenberg committed
146

Adam Lewenberg's avatar
Adam Lewenberg committed
147
148
149
150
151

[1]: https://cloud.google.com/sdk/gcloud/reference/sql/export/sql

[2]: https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity

Adam Lewenberg's avatar
Adam Lewenberg committed
152
153
154
[3]: https://gcr.io/google.com/cloudsdktool/cloud-sdk

[4]: https://cloud.google.com/sdk/gcloud
Adam Lewenberg's avatar
Adam Lewenberg committed
155
156
157
158
159

[5]: https://en.wikipedia.org/wiki/Cron

[6]: https://stackoverflow.com/a/69561246/268847

Adam Lewenberg's avatar
Adam Lewenberg committed
160
[7]: https://cloud.google.com/storage/docs/lifecycle