python-support-infra/TODO.md

4.0 KiB

Ansible / Dev TODO

Cluster/Ansible Setup

  • Setup Playbook
  • Root as local var: work/dtu/python-support/*
  • Get 2 DO Droplets
  • Provision DNS
  • Key Fingerprint as local var
  • Setup Wireguard wg0 between DO Droplets
  • Setup unattended-upgrades

Swarm

Stack: cleanup

  • Security Audit
  • Deploy Stack

Stack: mesh

  • Install Configs

  • Deploy Stack

  • rclone acme.json to R2 w/crypt

  • Security Audit

Stack: site-support

  • Generate Configs

  • Install Configs

  • Deploy Stack

  • Security Audit

Stack: updater

  • config: main

  • config: cleanup

  • config: mesh

  • config: site-support

  • Install Configs

  • Deploy Stack

  • Security Audit

Stack: auth

  • Write Stack

  • storage: authentik-postgres

  • storage: authentik-redis

  • Test Deploy

  • configs: Blueprints (export from prototyping)

  • Install Configs

  • role: API Setup of Things

  • Deploy Stack

  • updater: Integrate update-check

  • Security Audit

Stack: s3

Stack: chat

Stack: git

  • Install Configs

  • Install Secrets

  • Test Deploy

  • storage: gitea-redis

  • storage: gitea-postgres

  • storage: gitea-mellisearch

  • s3: gitea

  • s3 via rclone: gitea (repositories)

  • role: API Setup of Things

  • Deploy Stack

  • Configure gitea-actions w/auto-setup

  • manual: Migrate docker-mdbook, site-support.

Bonus:

  • Play with uptime.
  • Backups!

Playbook Creation Notes

  • mesh should use a non-local driver.

  • Implement rolling updates to services within stacks, whose configs have changed.

    • Note rolling_updates in the docker_config ansible module.
    • With a little information-gathering, I'm certain we can prevent actually stopping stacks on deploy and instead only do the secret rotation as described in the Docker documentation: https://docs.docker.com/engine/swarm/secrets/#example-rotate-a-secret
    • NOTE that the rclone volume stuff is always gonna need manual stop/start. Is jank. Such is the life.
  • Automatic R2 Bucket Creation

  • Only do the delays when we actually need to stop stacks / unmount volumes

  • Encrypted use of R2 bucket.

  • Templated security.txt in site-support

  • Templated limits to not kill the demo hosts in ex. site-support :)

  • Please, please, a nice README.md in site-support?

  • Move DNS stuff out to the stacks. Trust me!

  • Invest in some delegation to roles. These playbooks be gettin messy.

  • Figure out a way to deal with concurrent acme.json in Traefik. For now I've set it to one replica and vfs_cache_mode=full (I think none may be wonky with this particular need of Traefik?)

    • Needs more testing!