Plan for Networking w/Mesh, Encryption, Low-Overhead, LB, Fine-Grained ACLs #32
Labels
No Label
availability
bug
deployment-usability
duplicate
enhancement
help-wanted
question
security
stack-auth
stack-chat
stack-cleanup
stack-git
stack-mesh
stack-site-support
wontfix
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: python-support/python-support-infra#32
Loading…
Reference in New Issue
There is no content yet.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may exist for a short time before cleaning up, in most cases it CANNOT be undone. Continue?
Goals of the Networking Solution
The goal of our networking solution is as follows:
Two-Sided Service-Service-Protocol-Port ACL: Service replicas should be limited to only sending packets to other services, when a statically-defined, fine-grained Access Control List allows it.
Service-to-Replica Load Balancing: Service replicas should only address services. The solution should ensure that the packet actually reaches the "best suited" replica of a service.
Multi-Cluster Container-Level Network: Service replicas should all abstractly be thought of as existing on a hierarchy of big, virtual switches.
nftables
to route packets to "best suited" replicas. Independent, verifiable logic should be implemented to keep these named sets updated on each node, in a manner decided on by the Swarm leader and the host node of the container in question.Optimized for Complex, Dynamic Topology: When one replica send a packet to another, the packet should "hop" along the lowest-cost path. Doing so allows minimizing latency/throughout in complex & dynamic topologies, such as geo-distributed clusters with ever-changing topologies, clusters with a local wifi-only component and a cloud-backed component, etc. .
Simple Enough to Reason About / Low-As-Possible Overhead: The solution should be simple enough to reason about as a whole, and in prioritizing this, the protocol stack overhead should also fall to the minimum possible in order to properly achieve the goals of the solution.
E2E-Encrypted Replica-Replica Communication: Swarm logic (
docker secrets
) should be used to give the keys to the encrypted network interface used by a service replica to the service replica, thus inheriting its strong security guarantees and conveniences like secret rotation (see #23).root
on a node can still access the secret by just, well,exec
ing into the container. But look, at some point, the key does need to be used to encrypt/decrypt packets. This way, other things like ex. stealing the hard drive, stop working work, as the actual secret never really leaves the Swarm's encrypted Raft store. Additionally, we no longer need a ton of dangerous logic to rotate the secret.What's Wrong with Overlays?
This cannot be achieved with
docker overlay
networking:docker overlay
networks are limited to 256 containers per network segment, because they can only be made as/24
- and that's only the start of the problems. Addressing of services relies on a DNS server run by the Swarm, which runs some not-obvious load balancing logic, and which can interact strangely withnftables
. This is a problem for ACLs more fine-grained service-service.Proposed Solution
Since we control the entire infrastructure, we can do better. The solution here involves:
docker secret
, which is a key to aMACsec
-enabled end of aveth
tunnel that dominates its network namespace.veth
tunnel is part of aB.A.T.M.A.N Advanced
network, which provides for shortest-path routing between a "huge L2 switch of all containers in the cluster".wlan
radio range),GENEVE
tunnels are established statically to allowbatadv
to find good ways of "hopping" to containers on another L2 network.nftables
is run on all nodes with templated rulesets, which presumes that a named set containingservice-service-protocol-port
ACLs is available, a named set indicating live-updatedbatadv
link quality and desired load balancing weights thereof, etc. .nftables
sets updated with minimal latency on all nodes. (This mechanism can't be Raft; it's too latency-sensitive. But a latency-optimized, security-conscious peer-to-peer protocol would be very welcome.)Details and Step-by-Step to be Written!