Running Flannel on EKS

5 min readFeb 9, 2019

The Elastic Container Service for Kubernetes (EKS) is a managed service from AWS that was launched in 2018. As part of the service, AWS manages the Kubernetes control plane which consists of a set of masters nodes and an etcd database. When you provision a cluster, it comes pre-configured with the AWS VPC Container Networking Interface (CNI) plugin, a Kubernetes networking plugin that assigns IP addresses from your Virtual Private Cloud (VPC) to pods. Using this plugin has several advantages. First, you don’t incur the overhead of encapsulation and de-encapsulation as you do with overlay networks. Second, you can use VPC Flow Logs to capture information about the IP traffic going to and from the pods in your cluster. Third, there’s less contention for network bandwidth because fewer pods are sharing an Elastic Network Interface (ENI). And finally, traffic from the VPC can be directly routed to pods. The VPC CNI plugin has its own set of challenges, however. For example, the EC2 instance type and size determines number of pods you can run on an instance. And there are instances where attaining higher pod density will force you to over-provision the instance types you use for your worker nodes. Your VPC may also be so IP constrained that you cannot afford to assign IP address from your VPC to your pods, though the VPC CNI custom networking feature attempts to address this by allowing you to specify a separate set of subnets for your pod network.

Despite the VPC CNI’s advantages, folks may still want to use another CNI with EKS. This post explains how to install and configure the flannel CNI with EKS.

Installing flannel

The first step is to create an EKS cluster. I recommend using eksctl because it lets you to provision a cluster (and workers nodes) by issuing a single command.

eksctl create cluster --name flannel --ssh-access --nodes 0

When you create an EKS cluster, a daemonset for the VPC CNI plugin, called aws-node, is automatically created. As worker nodes are joined to the cluster, the Kubernetnes scheduler will schedule an instance of this daemon onto each node. This alters the route table on the instance, affecting its ability to support other network plugins like Flannel. Creating a node-less cluster will allow you to replace the aws-node daemonset with a different networking plugin before nodes are joined to the cluster.

The next step is to delete the aws-node daemonset.

kubectl delete ds aws-node -n kube-system

Since EKS doesn’t allow you to set the pod CIDR on the API server, we’re going to use an external etcd database to store the network configuration for flannel. To get started with etcd, we first need to install CoreOS’s config transpiler(ct).

brew install coreos-ct

Next, we want to get a token for our single node etcd “cluster”.

export TOKEN=$(curl -sw "\n" 'https://discovery.etcd.io/new?size=1' | cut -d "/" -f 4)

Execute the following command to create a file named etcd.yaml

bash
cat > etcd.yaml << EOF | sed 's/[\]//g' > etcd.yaml
# This config is meant to be consumed by the config transpiler, which will
# generate the corresponding Ignition config. Do not pass this config directly
# to instances of Container Linux.

etcd:
  advertise_client_urls:       http://{PUBLIC_IPV4}:2379
  initial_advertise_peer_urls: http://{PRIVATE_IPV4}:2380
  listen_client_urls:          http://0.0.0.0:2379
  listen_peer_urls:            http://{PRIVATE_IPV4}:2380
  discovery:                   https://discovery.etcd.io/$TOKEN
EOF

Run the following command to convert the etcd.yaml file into an ignition configuration. The output will be used to configure CoreOS when it first boots.

ct -platform=ec2 < etcd.yaml >> ec2metadata

Launch an instance of CoreOS-stable-1967.4.0. When running this command, replace key_name, sg_ids, and subnet_id with values that correspond to the appropriate resources within your AWS environment.

aws ec2 run-instances --image-id <ami_id> --instance-type t2.small --key-name <key_name> --security-group-ids <sg_ids> --subnet-id <subnet_id> --user-data file://ec2metadata

By adding the etcd instance to the worker node security group , you can avoid creating additional security group rules to allow the flannel daemon to read data from your etcd database.

After the instance is in a running state, SSH to the instance and execute the following commands:

export ETCDCTL_API=3
etcdctl put /coreos.com/network/config '{"Network":"18.16.0.0/16", "SubnetLen": 24, "Backend": {"Type": "vxlan", "VNI": 1}}'

Logout of etcd and install the flannel CNI. Before running the next command replace <etcd_ip> with the IP address of your etcd server.

cat << EOF | kubectl apply -f -
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: flannel
rules:
  - apiGroups:
      - ""
    resources:
      - pods
    verbs:
      - get
  - apiGroups:
      - ""
    resources:
      - nodes
    verbs:
      - list
      - watch
  - apiGroups:
      - ""
    resources:
      - nodes/status
    verbs:
      - patch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: flannel
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: flannel
subjects:
- kind: ServiceAccount
  name: flannel
  namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: flannel
  namespace: kube-system
---
kind: ConfigMap
apiVersion: v1
metadata:
  name: kube-flannel-cfg
  namespace: kube-system
  labels:
    tier: node
    app: flannel
data:
  cni-conf.json: |
    {
      "name": "cbr0",
      "plugins": [
        {
          "type": "flannel",
          "delegate": {
            "hairpinMode": true,
            "isDefaultGateway": true
          }
        },
        {
          "type": "portmap",
          "capabilities": {
            "portMappings": true
          }
        }
      ]
    }
  net-conf.json: |
    {
      "Network": "18.16.0.0/16",
      "Backend": {
        "Type": "vxlan"
      }
    }
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: kube-flannel-ds-amd64
  namespace: kube-system
  labels:
    tier: node
    app: flannel
spec:
  template:
    metadata:
      labels:
        tier: node
        app: flannel
    spec:
      hostNetwork: true
      nodeSelector:
        beta.kubernetes.io/arch: amd64
      tolerations:
      - operator: Exists
        effect: NoSchedule
      serviceAccountName: flannel
      initContainers:
      - name: install-cni
        image: quay.io/coreos/flannel:v0.10.0-amd64
        command:
        - cp
        args:
        - -f
        - /etc/kube-flannel/cni-conf.json
        - /etc/cni/net.d/10-flannel.conflist
        volumeMounts:
        - name: cni
          mountPath: /etc/cni/net.d
        - name: flannel-cfg
          mountPath: /etc/kube-flannel/
      containers:
      - name: kube-flannel
        image: quay.io/coreos/flannel:v0.10.0-amd64
        command:
        - /opt/bin/flanneld
        args:
        - --ip-masq
        - --kube-subnet-mgr=false
        - --etcd-endpoints=http://<etcd_ip>:2379
        resources:
          requests:
            cpu: "100m"
            memory: "50Mi"
          limits:
            cpu: "100m"
            memory: "50Mi"
        securityContext:
          privileged: true
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        volumeMounts:
        - name: run
          mountPath: /run
        - name: flannel-cfg
          mountPath: /etc/kube-flannel/
      volumes:
        - name: run
          hostPath:
            path: /run
        - name: cni
          hostPath:
            path: /etc/cni/net.d
        - name: flannel-cfg
          configMap:
            name: kube-flannel-cfg
EOF

Open the EC2 console and increase the desired and maximum count for the autoscaling group that eksctl created for your worker nodes.

Testing

Now that you’ve finished configuring flannel, let’s deploy some nginx pods.

kubectl apply -f https://raw.githubusercontent.com/kubernetes/website/master/
content/en/examples/application/deployment.yaml

Verify pods are getting created in the CIDR range that you configured.

$ kubectl get pods -o wide
NAME                                READY     STATUS    RESTARTS   AGE       IP               NODE
nginx-deployment-67594d6bf6-hnll6   1/1       Running   0          35s       18.16.117.226   ip-192-168-104-172.us-west-2.compute.internal
nginx-deployment-67594d6bf6-mb76m   1/1       Running   0          34s       18.16.184.97    ip-192-168-60-139.us-west-2.compute.internal

If you followed all the steps correctly, the IPs of your pods should be in the range you configured.

Conclusion

The VPC CNI plugin from AWS provides robust networking for Kubernetes pods. Nonetheless, there are situations where using an alternate CNI may be preferable. While this blog outlined the steps to install the flannel CNI on EKS, a similar approach can be used to install other CNIs such as Calico or Cilium.

Running Flannel on EKS

Installing flannel

Testing

Conclusion

Written by Jeremy Cowan

Responses (8)