Automated Backup/Restore Procedures #17
Labels
No Label
availability
bug
deployment-usability
duplicate
enhancement
help-wanted
question
security
stack-auth
stack-chat
stack-cleanup
stack-git
stack-mesh
stack-site-support
wontfix
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: python-support/python-support-infra#17
Loading…
Reference in New Issue
There is no content yet.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may exist for a short time before cleaning up, in most cases it CANNOT be undone. Continue?
NOTE: This requires the ability to make sidecar containers via labels.
Backups are non-negotiable in any kind of non-test setting, and backups are not backups before restoration has been tested (and ideally, re-tested by automation with some frequency).
We can get this into this infrastructure design easily like so:
Implement templated volume mounting as
{{ vol_latest['<stack_name>__<id>'] }}
.Mandate the creation of "backup volumes" for each created volume in the
deploy_volume_*
roles, and make them easily templatable using ex.{{ vol_backup_for['<stack_name>__<id>'] }}
. This should refer to an S3-backed crypt-enabled rclone-mounted volume, whose write operations aren't cached.Implement periodic backups in each stack's
docker-compose.yml
, as a sidecar container, which each ensures the backup of the contents of any volume, to its backup volume.debian-slim
container mounted as sidecar, which has the backup volume and the persistent volume both mounted, and runs awhile
loop on a timer.while
loop.Design boilerplate sidecar services for backup up one persistent volume to its backup volume, for ex. postgres, redis, mysql, sqlite, "just tar the folder", etc. .
tar
ing the volume anddate
ing the file.pg_dump
, taking an rdb snapshot, low-latency databases, redis aof, etc.) and those that don't have a network fs backing ("I need full POSIX and only local volumes are good enough"), over to rclone-mounted backup bucketIn the
deploy_volume_*
role
, allow restoring a backup of particular ID (a datetime), or the latest backup (sort the backups and pick the first one) while deploying.docker-compose.yml
stack file. The procedure itself should take the form of adocker-compose.yml
snippet, templated with the desired backup ID, to be run locally withdocker compose
- NOT as a Swarm stack, just as a glorified one-off script. The snippet can be specified directly as a YAML dictionary in the role invocation.depends:
support on top of the usualuntil
loops.Implement the 1 in the 3-2-1 backup scheme, by deploying a dedicated stack to back up all the backup volumes to off-site cold storage.
rclone
-mounted volume to two backends. Depending on needs, this kind of thing can be up/down-prioritized.rclone
-mounted volume to which snapshot-like backups are written. Should be on a different physical device than the volume itself (ex. if using MinIO as therclone
S3 backend, put these volumes onR2
), depending on how much one trusts the provider (ex.AWS
should be pretty sturdy for most small clusters, and in the rare case where it isn't, Backup-Backups are available too).duplicity
does all of this with no friction: https://geek-cookbook.funkypenguin.co.nz/recipes/duplicity/Pair any "volume restore" with a "forced rm of the stack". We need to guarantee that we do not fuck with the volumes of running stacks!
Implement option, in each volume deployment
role
, to include restoration of a backup (ex. latest) in the deployment process of any particular stack, should none be found (or should the matter be forced).Future Work
deploy_volume_*
role invocation. Very much advanced & oriented to mission-critical use cases.