Concurrent acme.json
for distributed Traefik #14
Labels
No Label
availability
bug
deployment-usability
duplicate
enhancement
help-wanted
question
security
stack-auth
stack-chat
stack-cleanup
stack-git
stack-mesh
stack-site-support
wontfix
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: python-support/python-support-infra#14
Loading…
Reference in New Issue
There is no content yet.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may exist for a short time before cleaning up, in most cases it CANNOT be undone. Continue?
The remote S3-backed
rclone
volume currently usesvfs_cache_mode=full
to ensure performance and full POSIX compatibility, unfortunately by cheating a bit: All file operations are cached locally, and mirrored to the S3 bucket wheneverrclone
finds it convenient. This is fine, so long as only one Traefik replica is enough to do the job. The moment there are>1
, concurrency bugs start nipping at it.When considering solutions, note that write performance is not important:
acme.json
is only written when SSL certs are renewed. Read performance is an open question; how much are the certs cached in memory?Solution Idea 1
The start of a solution would be if
vfs_cache_mode=off
worked; this would ensure that all file ops onacme.json
are handled via a round-trip to the S3 provider. Unfortunately, this still leaves several Traefik replicas with a potentially nasty race condition (which might get really bad if atomic file locking isn't something guaranteed byrclone
).Solution Idea 2
As is sometimes done with mail servers, it may also be possible to run a standalone
replicas=1
Traefik service solely responsible for renewing certificates. It would have the only token with bucket write permissions. One would need to solve how to signal existing Traefik containers to reload their SSL certs.All the "normal"
global
-replicated Traefik containers would have a read-only token, and utilize a--read-only
mount. Incidentally, if read-performance is critical,vfs_cache_mode=full
will probably be okay; one might have to accept the edge case of a potential few seconds every two months where an old cache would result in one replica results in serving the old (note, not yet expired) certificate while another serves the new.In this setup, make sure to be careful about keys if implementing #10.