Disaster recovery
Estafette CI is a pretty resilient system, but of course an incident can happen. Like wiping out a complete namespace by accident. This will not only loose you all of the pipeline build version numbers and references to the logs, but also the master encryption/decryption key.
Recreate/repair Helm release
If your Estafette CI release somehow managed to become corrupt or deleted you can reinstall it with the instructions below. Do ensure you have the original values.yaml
file stored somewhere. And also have any of the secrets used during installation at hand.
helm repo add estafette https://helm.estafette.io
helm repo update
# check diff
helm diff upgrade estafette-ci estafette/estafette-ci -n estafette-ci --values local-values.yaml --set api.secret.secretDecryptionKey=<base64 encoded secret decryption key>
# apply changes
helm upgrade --install estafette-ci estafette/estafette-ci -n estafette-ci --create-namespace --values local-values.yaml --timeout 600s --set api.secret.secretDecryptionKey=<base64 encoded secret decryption key> --set db-migrator.enable=false
Any of the secret values you might use can be passed with the --set
argument. Make sure you've stored these commands somewhere secure, like in a password manager, so you don't have to figure out all of the secrets when trying to restore functionality.
Note: the db-migrator
component needs to be disabled if the database needs to be restored from a backup, otherwise it already creates database tables which makes it impossible for a backup to be restored.
CockroachDB restore backup
If you've following the instructions in the Production / high availability section you've already set up a daily backup for Estafette's Cockroachdb database.
After having followed the instructions above you hopefully still have all of your data, but in case you threw away all the Persistent Volume Claims for the database you will have to restore it from your backup.
Note: You can't restore a backup in a database that already has a schema, so this section is only useful for recovering from scratch. If your database somehow got corrupted you will have to shrink the statefulset to 0, delete the pvc's and restore it to it's original number of replicas before continuing with the following steps.
In order to connect to the Cockroachdb database to perform queries, the Helm chart has a db-client
subchart, that's disabled by default. You can enabled it by setting values:
db-client:
enabled: true
db-migrator:
enabled: false
This will spin up a pod named estafette-ci-db-client
which you can ‘log in to’ with the following command:
kubectl exec -it estafette-ci-db-client -n estafette-ci \
-- ./cockroach sql \
--certs-dir=/cockroach-certs \
--host=estafette-ci-db-public
From here on you can follow the instructions as listed by the documentation of Cockroachdb.
For example in order to create a scheduled backup to Google Cloud Storage
- Get a keyfile for the service account used for the scheduled backup
- Execute the following query in the
estafette-ci-db-client
:
RESTORE
FROM 'gs://{bucket name}/db-backups?AUTH=specified&CREDENTIALS={base64 encoded key}';
Once you're done best to disable the db-client again by updating the values to
db-client:
enabled: false
db-migrator:
enabled: true
and making sure to either pass --set db-migrator.enable=true
to helm upgrade
or skip overriding this value completely. This will let the db-migrator
perform any schema changes that still need to happen, although with a daily backup the schema should be up to date.
What doesn't get restored
When recreating your Estafette installations the Bitbucket and Github integration details will not be recovered. This means you will have to go through the instructions at Github integration and/or Bitbucket integration again.