We have currently deployed our Kubernetes clusters to AWS though we are actively experimenting other cloud providers such as GCP. CoreOS Container Linux (which is the OS that all of our EC2 instances run) allows us to be confident that all of the latest security patches are being applied as they come out (patches are applied automatically and VMs rebooted without human intervention). Some basic configuration is applied to the VM at startup and the Kubelet is run as a systemd unit. Everything else runs on top of Kubernetes.
We use Flannel to establish our layer 3 pod network and Calico to perform network policy enforcement. Both are deployed in the same DaemonSet, ensuring that one replica is deployed to each VM.
Flannel is responsible for automatically assigning a unique IP address to each pod as it spins up. When a new worker node is added to the cluster, Flannel will reserve a chunk of the cluster-wide podnet and periodically renew its lease to avoid conflicts with other nodes. These pod IPs are viewable when querying for pods using kubectl.
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
nginx-5vrpq 2/2 Running 0 11d 10.XXX.14.3 ip-10-XXX-72-162.eu-west-1.compute.internal
nginx-7j72k 2/2 Running 0 12d 10.XXX.12.2 ip-10-XXX-150-131.eu-west-1.compute.internal
nginx-q2f5j 2/2 Running 0 12d 10.XXX.11.19 ip-10-XXX-1-39.eu-west-1.compute.internal
By default, all pods on a cluster can talk to each other. This raises some security concerns; for example, we might want to limit public egress network traffic of all pods (except for edge pods or web servers), or isolate highly sensitive pods from the rest of the cluster. We can solve this problem by using Network Policies. Calico runs in parallel with Flannel, and allows us to specify network policies to limit network traffic between different pods. These network policies are based on high-level constructs such as pod labels, which means that product teams can write their own policies using the same language they use to write deployment files. Calico runs as a privileged container, listening to changes in network policies and rewriting iptable rules to enforce them (in practice, we’ve seen updates being applied very quickly, in the order of 1 or 2 seconds).