Production / High Availability
By default this Helm chart sets up Estafette CI so that you can have a look at it, try it out, without requiring too much resources. However in order to run it in production you do want to tune some settings in order for all components to run in High Availability (HA) mode.
Default values
Having a look at the default values reveals how many replicas of each component is installed. It is notable that Cockroachdb is the only component already running in HA mode. This is because it's much harder to change it into HA once it's initialized. For other components it's no problem to do the change at a later stage.
api:
deployment:
replicaCount: 1
autoscaling:
enabled: false
web:
deployment:
replicaCount: 1
autoscaling:
enabled: false
db:
statefulset:
replicas: 3
metrics:
server:
replicaCount: 1
queue:
cluster:
enabled: false
High Availability
In order to have all the components that are there to handle browser requests, web hooks, storage or internal communication run in High Availability mode use the following values:
api:
deployment:
replicaCount: 3
autoscaling:
enabled: true
web:
deployment:
replicaCount: 3
autoscaling:
enabled: true
db:
statefulset:
replicas: 3
metrics:
server:
replicaCount: 2
queue:
cluster:
enabled: true
replicas: 3
Resources
It also make sense to set resource requests and limits for each component so it gets the cpu and memory it needs:
api:
deployment:
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
memory: 256Mi
web:
deployment:
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
memory: 128Mi
db:
statefulset:
resources:
requests:
cpu: 2000m
memory: 12Gi
limits:
memory: 12Gi
metrics:
server:
resources:
requests:
cpu: 1000m
memory: 6Gi
limits:
memory: 6Gi
queue:
nats:
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
memory: 128Mi
Do not these are just example values and you should tune resources over time and monitor them closely to see when you're either overprovisioning or underprovisiong each of the components.
Avoid Docker Hub rate limits
Since Docker Hub introduced lowered rate limits for anonymous pulls - see https://www.docker.com/increase-rate-limits - it's good practice to use image pull secrets for all of the containers pulled from Docker Hub. You can do so by creating a Docker Hub account and generate a token. Once you have those apply the following values:
api:
image:
credentials:
registry: docker.io
username: '<docker hub user>'
password: '<docker hub token>'
web:
image:
credentials:
registry: docker.io
username: '<docker hub user>'
password: '<docker hub token>'
db:
image:
credentials:
registry: docker.io
username: '<docker hub user>'
password: '<docker hub token>'
cron-event-sender:
image:
credentials:
registry: docker.io
username: '<docker hub user>'
password: '<docker hub token>'
hanging-job-cleaner:
image:
credentials:
registry: docker.io
username: '<docker hub user>'
password: '<docker hub token>'
db-migrator:
image:
credentials:
registry: docker.io
username: '<docker hub user>'
password: '<docker hub token>'
db-client:
image:
credentials:
registry: docker.io
username: '<docker hub user>'
password: '<docker hub token>'
metrics:
imagePullSecrets:
- name: estafette-ci-api.registry
queue:
imagePullSecrets:
- name: estafette-ci-api.registry
In order to ensure builds also use credentials in order to elevate docker pull quota add a credentials of type container-registry-pull
in the following way:
api:
config:
files:
credentials.yaml: |
credentials:
- name: 'docker-hub-pull'
type: 'container-registry-pull'
username: '<docker hub user>'
password: '<docker hub token>'
These will be used both as an image pull secret for the build/release jobs itself and for pulling stage container images inside of each build/release.
Store logs in Cloud Storage
By default full build and release logs are stored in the database. With large numbers of builds this can put a lot of stress on the database and it's also more costly than storing logs in Cloud Storage instead.
First go through the following steps
- Create a cloud storage bucket
- Create a service account
- Give the service account read/write permissions on the storage bucket
- Get a keyfile for the service account
Then update the values by adding the following:
api:
deployment:
extraEnv:
- name: GOOGLE_APPLICATION_CREDENTIALS
value: /iam/service-account-key.json
- name: ESCI_APISERVER_LOGWRITERS
value: cloudstorage
- name: ESCI_APISERVER_LOGREADER
value: cloudstorage
- name: ESCI_INTEGRATIONS_CLOUDSTORAGE_ENABLE
value: 'true'
- name: ESCI_INTEGRATIONS_CLOUDSTORAGE_PROJECTID
value: '<google cloud project id with bucket>'
- name: ESCI_INTEGRATIONS_CLOUDSTORAGE_BUCKET
value: '<bucket name>'
extraSecrets:
- key: iam
mountPath: /iam
b64encoded: true
data:
service-account-key.json: '{base64 encoded key file}'
This will ensure the api component has a service account keyfile that allows it to read and write logs from and to a cloud storage bucket.
Note: logs already stored in the database won't be moved over and won't be shown either when navigating to the respective log pages, so best to configured before you run any pipelines in a new installation.
Store your helm values
For disaster recovery it makes sense to keep your values file stored somewhere and have the unencrypted secrets used in those safed somewhere secure as well. You want this in order to be able to manually install the Helm release at any time, if any of the pipelines you usually use to upgrade Estafette CI are broken themselves.
See the Disaster recovery section for more detail on how to restore functionality.
CockroachDB backup
The most critical part of Estafette CI to safeguard for disaster discovery is the data stored in the database. The default database used by Estafette is CockroachDB. You'll have to set up a backup schedule as documented at https://www.cockroachlabs.com/docs/stable/manage-a-backup-schedule.html in order to have daily backups of the database.
In order to connect to the Cockroachdb database to perform queries, the Helm chart has a db-client
subchart, that's disabled by default. You can enabled it by setting values:
db-client:
enabled: true
This will spin up a pod named estafette-ci-db-client
which you can ‘log in to’ with the following command:
kubectl exec -it estafette-ci-db-client -n estafette-ci \
-- ./cockroach sql \
--certs-dir=/cockroach-certs \
--host=estafette-ci-db-public
From here on you can follow the instructions as listed by the documentation of Cockroachdb.
For example in order to create a scheduled backup to Google Cloud Storage
- Create a cloud storage bucket
- Create a service account
- Give the service account read/write permissions on the storage bucket
- Get a keyfile for the service account
- Execute the following query in the
estafette-ci-db-client
:
CREATE SCHEDULE daily_backup
FOR BACKUP INTO 'gs://{bucket name}/db-backups?AUTH=specified&CREDENTIALS={base64 encoded key}'
WITH revision_history
RECURRING '@daily'
WITH SCHEDULE OPTIONS first_run = 'now';
Once you're done best to disable the db-client again by updating the values to
db-client:
enabled: false
Back up decryption key
In case you're already using Estafette secrets in build manifests and centrally configured credentials you'll need a backup of the encryption/decryption key. You can get it's value with
kubectl get secret --namespace estafette-ci estafette-ci-api -o jsonpath="{.data.secretDecryptionKey}" | base64 --decode ; echo
Best store this securely in a password manager. This is the most sensitive piece of data used by Estafette. Once it leaks everyone can decrypt any of the Estafette secrets used by the system.