Configs/Secrets Rotation #23

New issue

Open

opened 2023-08-21 10:59:54 +02:00 by so-rose · 0 comments

so-rose commented

2023-08-21 10:59:54 +02:00

Owner

Both secrets and configs will be referred to as configs. The usage is identical.

Docker stacks are, when redeployed, capable of picking up on service changes (which includes changes to configs and secrets) and choosing when to restart what. When one relies on this, niceties like rolling updates become easy (see update_config).

Unfortunately, configs cannot be changed while being used. To rotate a config, one must:

Create a new config, with a unique name, in the Swarm.
In docker-compose.yml, replace the (also unique) name for the old config, with the unique name for the new config.
Deploy this altered docker-compose.yml file.

This is where complexities arise.

Implementation

In practice, here's the way to do it (separate config and secret):

In docker-compose.yml, specify config names using Jinja2-templating: `"{{ config_latest['<stack_name>__'] }}"
In docker-compose.yml, specify secret names using Jinja2-templating: `"{{ secret_latest['<stack_name>__'] }}"
Load docker-compose.yml using template instead of file.
Load configs using template instead of file.
Specify rolling_versions in community.docker.docker_config. Set versions_to_keep to 2.
Register the result of looped config creation as config_result.
Looping over config_result.results, add to (and/or create) dict config_latest with key of config name(removing the _vX postfix), and value of item.config_name.
Remove the stack stop step.
Copy/paste all of this to the corresponding secrets role.

In total, the procedure will be modified to be:

Configs are read & installed, with a postfix _vX. If there was already a config, the new config will be created with incremented version postfix _vX. If no change is needed, no new config will be created. Only the last config will be kept (to allow rollback).
The latest actual config name will be registered to a dictionary mapping config names to the actual latest config name, config_latest: '<stack_name>__<id>' => '<stack_name>__<id>_vX'.
When deploying the stack, all uses of a config/secret name will be lightly templated to actually refer to the latest config (which may or may not be different than the one used when deploying the last time).
During deployment, services will be restarted one by one if they need a new config/secret.
- If the service fails as a result of the config update, rollback_config in the stack will enable reverting to the previous version without human input.
- If there are several replicas, update_config in the stack will enable re-creating one replica at a time, ensuring no downtime even in the case of failure (in which case, rolling back to the last working version can be configured).
- These built-in Compose niceties are opt-in; thus, applications that don't work with them simply don't specify them in their stacks.

References / Resources

A bit of inspiration to how this solution came to be.

Why Version, Not Hash?

Keeping a unique, incrementing int ID is the right choice here:

Low-entropy secrets shouldn't be a thing, but keeping a non-secret hash makes such oversights much, much worse, operationally speaking (note: this kind of obfuscation won't save you, but there's no reason to make lapses in common sense extra deadly).
Rolling versions are built-in to Ansible when making configs, which in turn can easily report its choices for templating in the stack.
Versions make it easy to keep an operational sense of how many changes are made to services, based on which config versions they're using.
Semantically speaking, the stack wants to deploy the "latest" config. Versions don't need any extra metadata to communicate which is the latest; just pick the biggest number! In practice this isn't so important; but it's a cleaner mental model.

*Both secrets and configs will be referred to as configs. The usage is identical.* Docker stacks are, when redeployed, capable of picking up on service changes (which includes changes to configs and secrets) and choosing when to restart what. When one relies on this, niceties like rolling updates become easy (see [`update_config`](https://docs.docker.com/compose/compose-file/compose-file-v3/#update_config)). Unfortunately, `configs` cannot be changed while being used. To rotate a `config`, one must: - Create a new `config`, with a unique name, in the Swarm. - In `docker-compose.yml`, replace the (also unique) name for the old `config`, with the unique name for the new `config`. - Deploy this altered `docker-compose.yml` file. This is where complexities arise. # Implementation In practice, here's the way to do it (separate `config` and `secret`): - [ ] In `docker-compose.yml`, specify `config` names using Jinja2-templating: `"{{ config_latest['<stack_name>__<id>'] }}" - [ ] In `docker-compose.yml`, specify `secret` names using Jinja2-templating: `"{{ secret_latest['<stack_name>__<id>'] }}" - [ ] Load `docker-compose.yml` using `template` instead of `file`. - [ ] Load configs using `template` instead of `file`. - [ ] Specify `rolling_versions` in [`community.docker.docker_config`](https://docs.ansible.com/ansible/latest/collections/community/docker/docker_config_module.html). Set `versions_to_keep` to 2. - [ ] Register the result of looped config creation as `config_result`. - [ ] Looping over `config_result.results`, add to (and/or create) dict `config_latest` with key of config name(removing the `_vX` postfix), and value of `item.config_name`. - [ ] Remove the stack stop step. - [ ] Copy/paste all of this to the corresponding secrets `role`. In total, the procedure will be modified to be: - Configs are read & installed, with a postfix `_vX`. If there was already a config, the new config will be created with incremented version postfix `_vX`. If no change is needed, no new config will be created. Only the last config will be kept (to allow rollback). - The latest actual config name will be registered to a dictionary mapping config names to the actual latest config name, `config_latest: '<stack_name>__<id>' => '<stack_name>__<id>_vX'`. - When deploying the stack, all uses of a config/secret name will be lightly templated to actually refer to the latest config (which may or may not be different than the one used when deploying the last time). - During deployment, services will be restarted one by one if they need a new config/secret. - If the service fails as a result of the config update, `rollback_config` in the stack will enable reverting to the previous version without human input. - If there are several replicas, `update_config` in the stack will enable re-creating one replica at a time, ensuring no downtime even in the case of failure (in which case, rolling back to the last working version can be configured). - These built-in Compose niceties are opt-in; thus, applications that don't work with them simply don't specify them in their stacks. # References / Resources A bit of inspiration to how this solution came to be. - https://docs.ansible.com/ansible/latest/collections/community/docker/docker_config_module.html#ansible-collections-community-docker-docker-config-module-parameter-rolling-versions - https://stackoverflow.com/questions/29512443/register-variables-in-with-items-loop-in-ansible-playbook - https://gist.github.com/pwalkr/bf3d96de629337afbb333a0e1fd0b800 - https://docs.docker.com/engine/reference/commandline/stack_deploy/#examples - https://anthonymineo.com/rotating-your-docker-secrets-can-be-easy-if-you-plan-for-it/ - https://docs.docker.com/engine/swarm/secrets/#example-rotate-a-secret ## Why Version, Not Hash? Keeping a unique, incrementing int ID is the right choice here: - Low-entropy secrets shouldn't be a thing, but keeping a non-secret hash makes such oversights much, much worse, operationally speaking (**note: this kind of obfuscation won't save you, but there's no reason to make lapses in common sense extra deadly**). - Rolling versions are built-in to Ansible when making configs, which in turn can easily report its choices for templating in the stack. - Versions make it easy to keep an operational sense of how many changes are made to services, based on which config versions they're using. - Semantically speaking, the stack wants to deploy the "latest" config. Versions don't need any extra metadata to communicate which is the latest; just pick the biggest number! *In practice this isn't so important; but it's a cleaner mental model.*