python-support-infra/TODO.md

150 lines
4.0 KiB
Markdown
Raw Normal View History

# Ansible / Dev TODO
Cluster/Ansible Setup
- [x] Setup Playbook
- [x] Root as local var: `work/dtu/python-support/*`
- [x] Get 2 DO Droplets
- [x] Provision DNS
- [ ] Key Fingerprint as local var
- [x] Setup Wireguard wg0 between DO Droplets
- [ ] Setup unattended-upgrades
Swarm
- [x] Install Docker
- [x] Check Swarm ports on wg0: https://docs.docker.com/engine/swarm/swarm-tutorial/
- [x] Init Swarm manager & worker
- [x] Install rclone volume plugin: https://rclone.org/docker/
- [ ] Label big one as 'storage'
Stack: cleanup
- [x] Security Audit
- [x] **Deploy Stack**
Stack: mesh
- [x] Install Configs
- [x] **Deploy Stack**
- [x] rclone `acme.json` to R2 w/crypt
- [ ] Security Audit
Stack: site-support
- [x] Generate Configs
- [x] Install Configs
- [x] **Deploy Stack**
- [ ] Security Audit
Stack: updater
- [ ] config: main
- [ ] config: cleanup
- [ ] config: mesh
- [ ] config: site-support
- [ ] Install Configs
- [ ] **Deploy Stack**
- [ ] Security Audit
Stack: auth
- [ ] Write Stack
- [ ] storage: authentik-postgres
- [ ] storage: authentik-redis
- [ ] *Test Deploy*
- [ ] configs: Blueprints (export from prototyping)
- [ ] Install Configs
- [ ] role: API Setup of Things
- [ ] **Deploy Stack**
- [ ] updater: Integrate update-check
- [ ] Security Audit
Stack: s3
- [ ] Write Stack
- https://geek-cookbook.funkypenguin.co.nz/recipes/minio/
- Restrict to 'storage' label.
- [ ] ...?
- [ ] Install Configs
- [ ] Install Secrets
- [ ] storage: minio
- [ ] *Test Deploy*
- [ ] role: API Setup of Things
- [ ] **Deploy Stack**
- [ ] auth: Integrate OIDC
- https://min.io/docs/minio/container/operations/external-iam.html
- https://goauthentik.io/integrations/services/minio/
- [ ] updater: integrate
- [ ] Security Audit
Stack: chat
- [ ] Write Stack
- https://geek-cookbook.funkypenguin.co.nz/recipes/minio/
- Restrict to 'storage' label.
- [ ] ...?
- [ ] Install Configs
- [ ] Install Secrets
- [ ] storage: zulip-postgres
- [ ] storage: zulip-rabbitmq
- [ ] storage: zulip-redis
- [ ] s3: zulip
- [ ] *Test Deploy*
- [ ] auth: Integrate OIDC
- https://zulip.readthedocs.io/en/latest/production/authentication-methods.html#openid-connect
- Backup SAML: https://goauthentik.io/integrations/services/zulip/
- [ ] role: API Setup of Things
- [ ] **Deploy Stack**
- [ ] updater: Integrate
- [ ] Security Audit
Stack: git
- [ ] Install Configs
- [ ] Install Secrets
- [ ] *Test Deploy*
- [ ] storage: gitea-redis
- [ ] storage: gitea-postgres
- [ ] storage: gitea-mellisearch
- https://www.meilisearch.com/docs/learn/cookbooks/docker
- [ ] s3: gitea
- [ ] s3 via rclone: gitea (repositories)
- [ ] role: API Setup of Things
- [ ] **Deploy Stack**
- [ ] Configure gitea-actions w/auto-setup
- [ ] manual: Migrate docker-mdbook, site-support.
Bonus:
- Play with `uptime`.
- Backups!
# Playbook Creation Notes
- [x] mesh should use a non-`local` driver.
- [ ] Implement rolling updates to services within stacks, whose configs have changed.
- Note `rolling_updates` in the `docker_config` ansible module.
- With a little information-gathering, I'm certain we can prevent actually stopping stacks on deploy and instead only do the secret rotation as described in the Docker documentation: https://docs.docker.com/engine/swarm/secrets/#example-rotate-a-secret
- NOTE that the rclone volume stuff is always gonna need manual stop/start. Is jank. Such is the life.
- [ ] Automatic R2 Bucket Creation
- [ ] Only do the delays when we actually need to stop stacks / unmount volumes
- [ ] Encrypted use of R2 bucket.
- https://rclone.org/crypt/
- [ ] Templated security.txt in site-support
- [ ] Templated limits to not kill the demo hosts in ex. site-support :)
- [ ] Please, please, a nice README.md in site-support?
- [ ] Move DNS stuff out to the stacks. Trust me!
- [ ] Invest in some delegation to roles. These playbooks be gettin messy.
- [ ] Figure out a way to deal with concurrent `acme.json` in Traefik. For now I've set it to one replica and `vfs_cache_mode=full` (I think `none` may be wonky with this particular need of Traefik?)
- Needs more testing!