Custom networking with the AWS VPC CNI plug-in
Backdrop
It is not uncommon for large enterprises to have difficulty finding Classless Inter-Domain Routing (CIDR) blocks for their AWS Virtual Private Networks (VPCs). A shortage of CIDR blocks can often force these enterprises to deploy new cloud infrastructure into existing VPCs instead of creating new VPCs. Enterprises may also elect to use an existing VPC because it already has connectivity to other environments, such as an on-premises datacenter, that services within the VPC will need to access. Using an existing VPC for EKS, however, presents a variety of challenges. By default, the AWS VPC CNI plug-in assigns pods a routable IP address from the worker node’s subnet. When these VPCs were created, administrators didn’t account for pods getting IP addresses. As a consequence, a lot of enterprise that tried deploying EKS into existing VPCs eventually discovered they didn’t have enough IP addresses available for all the pods they wanted to run.
Two features were recently released to address the IP exhaustion/scarcity issue. The first is a feature of the AWS VPC CNI plug-in that allows you to use a different subnet CIDR range for pods. This is configured by updating the aws-node daemonset with a new environment variable called AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG
. When this variable is set to true
, the CNI will allocate IP addresses from the subnet specified in the ENIConfig assigned to the worker node.
The ENIConfig, otherwise known as custom networking, is a Custom Resource Definition (CRD) introduced in v1.2.1 of the CNI plugin. It allows you to specify the subnet and security groups you want to use for the pods running on a particular worker node. ENIConfigs can be assigned to worker nodes by annotating the node with the k8s.amazonaws.com/eniConfig=<ENIConfig name>
key/value pair or by labeling the node. Be aware that when you use this feature, the availability zone (AZ) of the subnet in the ENIConfig has to match the AZ of the worker node. Additionally, only one ENIConfig can be assigned to a worker node at a time.
The other feature allows you to use the 100.64.0.0/10 and 198.19.0.0/16 CIDRs (CG-NAT) with EKS. When these are added to your VPC as extended CIDRs, you can use them in your ENIConfigs to assign IP addresses to pods. This effectively allows you to create an environment where pods no longer consume any RFC1918 IP addresses from your VPC.
Caveats
When you enable the CNI custom networking feature, pods are no longer assigned IP addresses from the worker node’s primary ENI. This results in lower pod densities because you have one less ENI to use for assigning IP addresses to pods.
The CNI custom networking feature should be enabled at the time your cluster is provisioned. If it is enabled after your worker nodes are provisioned, it will only affect new ENIs that get attached to your worker nodes. ENIs that are already attached to a worker node will continue using the worker node’s subnet for assigning IPs to pods.
Rather than doing a rolling replacement of all the worker nodes in your cluster after enabling custom networking, we suggest updating the AWS CloudFormation template in the EKS Getting Started Guide with a custom resource that calls a Lambda function to update the aws-node
daemonset with the environment variable to enable custom networking before the worker nodes are provisioned.
New in Version 1.4
A new feature was added to v1.4 of the AWS VPC CNI plugin that allows you to assign an ENIConfig to an worker node by assigning it a label. The name of the label is set by adding the ENI_CONFIG_LABEL_DEF
environment variable to the aws-node
daemonset. However, another option is to to set ENI_CONFIG_LABEL_DEF
equal to failure-domain.beta.kubernetes.io/zone
. This will automatically set the value of ENI_CONFIG_LABEL_DEF
to the worker node's availability zone, e.g. us-west-2a
which you can use as the name for your ENIConfig.
Labels, unlike annotations, can be set by adding the --node-labels
flag to the kubelet when the instance is being bootstrapped. On EKS, you can make these changes by running bootstrap.sh with the following flags: bootstrap.sh --kubelet-extra-args --node-labels=<your_label_name>=<ENIConfig_name>
Simplifying the implementation of custom networking
We now have the ability to use a different set of CIDR ranges for assigning IP addresses to pods. The downside is, you still have to manually annotate or label your nodes with an ENIConfig, and you still have to verify that the subnet in the ENIConfig is in the same AZ as the worker node. Since this can be fairly laborious I’ve written a Python script that:
- Creates an extended CIDR range in an existing VPC
- Creates a subnet from that CIDR in each AZ in VPC
- Applies the tag
kubernetes.io/cluster/<clustername>
to each subnet - Outputs a set of set of Kubernetes manifest files to create ENIConfigs using the name of each AZ in the VPC, e.g. us-west-2a
When you assign the name of the AZ to the name of the ENIConfig and set the ENI_CONFIG_LABEL_DEF
equal to failure-domain.beta.kubernetes.io/zone
the appropriate ENIConfig will automatically get assigned to your worker nodes.
Running the script
Start by cloning the project from GitHub. The script is written in Python 3.6. If you don’t have Python 3.x installed on your machine, you can install it from https://www.python.org/downloads/. Once you’ve cloned the repository, open a terminal, navigate to the project directory, and then run python main.py
0 vpc-0a60d94b947b43ff8
1 vpc-7bc1da1d
Choose a VPC ID: 00 sg-049a6248fc3d0ff3a eksctl-riverrun-cluster-ControlPlaneSecurityGroup-2WHP8Q77EVG6
1 sg-0d0e1b63640f61d44 eksctl-riverrun-nodegroup-ng-82fe96e7-SG-R9BUZRTUV245
2 sg-0d5b4b42673e3b275 eksctl-riverrun-cluster-ClusterSharedNodeSecurityGroup-2Z36V1LEUFKZ
3 sg-0f05b50fc46b4e573 default
Choose a security group: 1Enter VPC CIDR range, e.g. 100.64.0.0/16: 100.64.0.0/16Enter subnet mask, e.g. 24: 22created subnet-09523931143d9681e
created subnet-0dbd7b318558192b0
created subnet-00a36fef183824a06
created subnet-0e0d4fe9177ec7240
created subnet-09873f70828171aa0
created subnet-071a024f54c2f1599
The script will output several Kubernetes manifests for creating ENIConfigs. You should apply these to your cluster before creating your worker nodes. The contents of the manifest look like this:
apiVersion: crd.k8s.amazonaws.com/v1alpha1
kind: ENIConfig
metadata:
name:us-east-1a
spec:
subnet:100.64.0.0/22
securityGroups:
- sg-0d0e1b63640f61d44
To apply the manifests, simply run kubectl apply -f <filename>
Conclusion
The use of custom networking will decrease the likelihood of encountering IP exhaustion issues in the future. If you have ideas about how I can improve the script, please consider filing an issue on GitHub, or better yet, submit a pull request!
To learn more how custom networking affects pod density and max-pods see, “The impacts of using custom networking with the AWS VPC CNI”.