Provisioning an EKS cluster with CAPI

In an earlier post I described how to use KinD to run EKS-Distro on your local machine. In this post, I will explain how to use that cluster to bootstrap an EKS cluster in the AWS cloud using the Cluster API (CAPI) and a few things I learned along the way.

Why you should care about the Cluster API

At this point, you may be thinking to yourself, “Why do I need another way to provision and manage the lifecycle of an EKS cluster? I can already use eksctl, Terraform, CloudFormation, Pulumi, the Cloud Development Kit (CDK), and so on.” It really boils down to consistency. The Cluster API provides a consistent, declarative way to deploy and manage Kubernetes clusters across a variety of different environments. This is largely possible because the Cluster API provider establishes a common set schemas (CRDs) and a controller framework that are applicable across providers. Furthermore, EKS Anywhere will likely leverage CAPI to bootstrap clusters into VMware and bare metal environments. And having a consistent, repeatable way to deploy and manage clusters across these different environments will ultimately help simplify operations. For example, imagine using GitOps for cluster lifecycle management in addition to configuration.

Getting started

In October 2020, Weaveworks published a blog that walked through how to create an EKS cluster using the Cluster API Provider for AWS (CAPA). The steps have largely stayed the same with a couple of minor exceptions which I will describe henceforth. I am providing the instructions here simply for convenience. If you want additional information about each step, please read the Weaveworks blog.

apiVersion: bootstrap.aws.infrastructure.cluster.x-k8s.io/v1alpha1
kind: AWSIAMConfiguration
spec:
bootstrapUser:
enable: true
eks:
enable: true
iamRoleCreation: true # Set to true if you plan to use the EKSEnableIAM feature flag to enable automatic creation of IAM roles
defaultControlPlaneRole:
disable: false # Set to false to enable creation of the default control plane role

Although I’m leaving the defaults in-place, you are free to tailor these for your own environment.

4. Set the environment variables for your environment and use the eks.config file to create the required IAM resources in AWS:

export AWS_REGION=us-east-2 # This is used to help encode your environment variables
export AWS_ACCESS_KEY_ID=<access-key-for-bootstrap-user>
export AWS_SECRET_ACCESS_KEY=<secret-access-key-for-bootstrap-user>
export AWS_SESSION_TOKEN=<session-token> # If you are using Multi-Factor Auth.

5. Prepare the environment for CAPA by running clusterawsadm

clusterawsadm bootstrap iam create-cloudformation-stack  --config eks.config

This creates an IAM user called bootstrapper.cluster-api-provider-aws.sigs.k8s.io. The user belongs to an IAM group with the prefix cluster-api-provider. Attached to the group is a policy that grants the controller the ability to provision resources in your AWS account. The policy for the group appears below:

{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"ec2:AllocateAddress",
"ec2:AssociateRouteTable",
"ec2:AttachInternetGateway",
"ec2:AuthorizeSecurityGroupIngress",
"ec2:CreateInternetGateway",
"ec2:CreateNatGateway",
"ec2:CreateRoute",
"ec2:CreateRouteTable",
"ec2:CreateSecurityGroup",
"ec2:CreateSubnet",
"ec2:CreateTags",
"ec2:CreateVpc",
"ec2:ModifyVpcAttribute",
"ec2:DeleteInternetGateway",
"ec2:DeleteNatGateway",
"ec2:DeleteRouteTable",
"ec2:DeleteSecurityGroup",
"ec2:DeleteSubnet",
"ec2:DeleteTags",
"ec2:DeleteVpc",
"ec2:DescribeAccountAttributes",
"ec2:DescribeAddresses",
"ec2:DescribeAvailabilityZones",
"ec2:DescribeInstances",
"ec2:DescribeInternetGateways",
"ec2:DescribeImages",
"ec2:DescribeNatGateways",
"ec2:DescribeNetworkInterfaces",
"ec2:DescribeNetworkInterfaceAttribute",
"ec2:DescribeRouteTables",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSubnets",
"ec2:DescribeVpcs",
"ec2:DescribeVpcAttribute",
"ec2:DescribeVolumes",
"ec2:DetachInternetGateway",
"ec2:DisassociateRouteTable",
"ec2:DisassociateAddress",
"ec2:ModifyInstanceAttribute",
"ec2:ModifyNetworkInterfaceAttribute",
"ec2:ModifySubnetAttribute",
"ec2:ReleaseAddress",
"ec2:RevokeSecurityGroupIngress",
"ec2:RunInstances",
"ec2:TerminateInstances",
"tag:GetResources",
"elasticloadbalancing:AddTags",
"elasticloadbalancing:CreateLoadBalancer",
"elasticloadbalancing:ConfigureHealthCheck",
"elasticloadbalancing:DeleteLoadBalancer",
"elasticloadbalancing:DescribeLoadBalancers",
"elasticloadbalancing:DescribeLoadBalancerAttributes",
"elasticloadbalancing:DescribeTags",
"elasticloadbalancing:ModifyLoadBalancerAttributes",
"elasticloadbalancing:RegisterInstancesWithLoadBalancer",
"elasticloadbalancing:DeregisterInstancesFromLoadBalancer",
"elasticloadbalancing:RemoveTags"
],
"Resource": [
"*"
],
"Effect": "Allow"
},
{
"Condition": {
"StringLike": {
"iam:AWSServiceName": "elasticloadbalancing.amazonaws.com"
}
},
"Action": [
"iam:CreateServiceLinkedRole"
],
"Resource": [
"arn:*:iam::*:role/aws-service-role/elasticloadbalancing.amazonaws.com/AWSServiceRoleForElasticLoadBalancing"
],
"Effect": "Allow"
},
{
"Condition": {
"StringLike": {
"iam:AWSServiceName": "spot.amazonaws.com"
}
},
"Action": [
"iam:CreateServiceLinkedRole"
],
"Resource": [
"arn:*:iam::*:role/aws-service-role/spot.amazonaws.com/AWSServiceRoleForEC2Spot"
],
"Effect": "Allow"
},
{
"Action": [
"iam:PassRole"
],
"Resource": [
"arn:*:iam::*:role/*.cluster-api-provider-aws.sigs.k8s.io"
],
"Effect": "Allow"
},
{
"Action": [
"secretsmanager:CreateSecret",
"secretsmanager:DeleteSecret",
"secretsmanager:TagResource"
],
"Resource": [
"arn:*:secretsmanager:*:*:secret:aws.cluster.x-k8s.io/*"
],
"Effect": "Allow"
},
{
"Action": [
"ssm:GetParameter"
],
"Resource": [
"arn:aws:ssm:*:*:parameter/aws/service/eks/optimized-ami/*"
],
"Effect": "Allow"
},
{
"Action": [
"iam:GetRole",
"iam:ListAttachedRolePolicies"
],
"Resource": [
"arn:aws:iam::*:role/*"
],
"Effect": "Allow"
},
{
"Action": [
"iam:GetPolicy"
],
"Resource": [
"arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
],
"Effect": "Allow"
},
{
"Action": [
"eks:DescribeCluster",
"eks:ListClusters",
"eks:CreateCluster",
"eks:TagResource",
"eks:UpdateClusterVersion",
"eks:DeleteCluster",
"eks:UpdateClusterConfig",
"eks:UntagResource"
],
"Resource": [
"arn:aws:eks:*:*:cluster/*"
],
"Effect": "Allow"
},
{
"Condition": {
"StringEquals": {
"iam:PassedToService": "eks.amazonaws.com"
}
},
"Action": [
"iam:PassRole"
],
"Resource": [
"*"
],
"Effect": "Allow"
}
]
}

clusterawsadm also creates a few other IAM roles that get assigned to worker nodes and the EKS control plane later. You can see the complete list of resources clusterawsadm creates by examining the cluster-api-provider-aws-sigs-k8s-io stack in the CloudFormation console or by describing the stack.

aws cloudformation describe-stack-resources --stack-name cluster-api-provider-aws-sigs-k8s-io --region us-east-2 --output table

6. Export your AWS credentials to an environment variable.

export AWS_B64ENCODED_CREDENTIALS=$(clusterawsadm bootstrap credentials encode-as-profile)

Enabling CAPA

Run the following commands to install the Cluster API Provider for AWS with EKS support:

export EXP_EKS=true
export EXP_EKS_IAM=true
export EXP_EKS_ADD_ROLES=true
clusterctl init -b kubeadm:v0.3.19 -c kubeadm:v0.3.19 --core cluster-api:v0.3.19 --infrastructure=aws

This will provision a whole slew of resources into your KinD cluster including the set of Providers as seen below:

$ kubectl get providers -A
NAMESPACE NAME TYPE PROVIDER VERSION WATCH NAMESPACE
capa-eks-bootstrap-system bootstrap-aws-eks BootstrapProvider aws-eks v0.6.6
capa-eks-control-plane-system control-plane-aws-eks ControlPlaneProvider aws-eks v0.6.6
capa-system infrastructure-aws InfrastructureProvider aws v0.6.6
capi-kubeadm-bootstrap-system bootstrap-kubeadm BootstrapProvider kubeadm v0.3.19
capi-system cluster-api CoreProvider cluster-api v0.3.19

Currently clusterctl init gets the latest release by default for each provider and then validates the provider contract using the metadata.yaml file. However, this causes init to fail when using an older versions of clusterctl, e.g. v0.3.19, and a newer version is available with a different contract, e.g. v0.4.0. Consequently, you should always specify the version of kubeadm to use for the bootstrap, control plane and core providers.

Next, you’ll need to update capa-eks-control-plane-system-capa-eks-control-plane-manager-role cluster role. There is currently a bug in clusterctl init where awsclustercontrolleridentities is missing from the ClusterRole RBAC policy. Fixing this involves adding awsclustercontrolleridentities to infrastructure.cluster.x-k8s.io api group.

Create a file called control-plane-manager-role.yaml with the following contents:

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: capa-eks-control-plane-system-capa-eks-control-plane-manager-role
labels:
cluster.x-k8s.io/provider: control-plane-aws-eks
clusterctl.cluster.x-k8s.io: ''
rules:
- verbs:
- create
- delete
- get
- list
- patch
- update
- watch
apiGroups:
- ''
resources:
- secrets
- verbs:
- get
- list
- watch
apiGroups:
- cluster.x-k8s.io
resources:
- clusters
- clusters/status
- verbs:
- create
- delete
- get
- list
- patch
- update
- watch
apiGroups:
- controlplane.cluster.x-k8s.io
resources:
- awsmanagedcontrolplanes
- verbs:
- get
- patch
- update
apiGroups:
- controlplane.cluster.x-k8s.io
resources:
- awsmanagedcontrolplanes/status
- verbs:
- create
- get
- list
- patch
- watch
apiGroups:
- ''
resources:
- events
- verbs:
- get
- list
- watch
apiGroups:
- infrastructure.cluster.x-k8s.io
resources:
- awsclustercontrolleridentities
- awsclusterroleidentities
- awsclusterstaticidentities
- verbs:
- get
- list
- watch
apiGroups:
- infrastructure.cluster.x-k8s.io
resources:
- awsmachinepools
- awsmachinepools/status
- verbs:
- get
- list
- watch
apiGroups:
- infrastructure.cluster.x-k8s.io
resources:
- awsmachines
- awsmachines/status
- verbs:
- get
- list
- watch
apiGroups:
- infrastructure.cluster.x-k8s.io
resources:
- awsmanagedclusters
- awsmanagedclusters/status
- verbs:
- get
- list
- watch
apiGroups:
- infrastructure.cluster.x-k8s.io
resources:
- awsmanagedmachinepools
- awsmanagedmachinepools/status

Then apply the updated ClusterRole to your cluster by running the following command:

kubectl apply -f control-plane-manager-role.yaml

Creating the EKS cluster

export AWS_REGION=us-east-1
export AWS_SSH_KEY_NAME=default
export KUBERNETES_VERSION=v1.17.0
export WORKER_MACHINE_COUNT=1
export AWS_NODE_MACHINE_TYPE=t2.medium

clusterctl config cluster managed-test --flavor eks > capi-eks.yaml

Setting the flavor flag to eks instructs clusterctl to use the cluster-template-eks.yaml template. The link to cluster-template-eks.yaml is broken in the Weaveworks blog. Fortunately, a copy of it is still available in GitHub. Running the clusterctl command substitutes the placeholder values in the template with the values assigned in the export statements shown above. If you want to add to or modify the template, the schema for the EKS control plane can be found in the CAPA book. Inspect the resulting yaml file to verify that all of the placeholders have been populated before continuing to the next step.

2. Apply the capi-eks.yaml to your KinD cluster:

kubectl apply -f capi-eks.yaml

The AWS provider for the Cluster API will, by default, create a VPC that spans multiple AZs with public and private subnets. It will also create NAT gateways for each private subnet, configure the necessary route tables, etc. Once the infrastructure is ready, it provisions an EKS cluster with 1 t2.medium instance. The nice thing about CAPI is that it runs as a series of control loops. that are calling the AWS APIs instead of CloudFormation. These loops should run until the current state matches the desired state. I actually witnessed this myself when I initially tried creating a cluster. At first there weren’t enough available EIPs in the region. After releasing a few of them, the controller finished reconciling the changes and the cluster and worker node were provisioned. If I had used CloudFormation, the stack creation would have failed and all the changes would have been automatically rolled back. Ordinarily, I’d be concerned about calling the AWS APIs directly, however, the controller implements exponential backoff/retry logic when the API calls get throttled. You can see this in the WaitForWithRetryable function and in other places in the code.

Incidentally, you can see the various actions being performed upon creation of a managed control plane by looking at the reconcileNormal method in awsmanagedcontrolplane_controller.go.

If you want to create a cluster with a managed node group, use the eks-machinepool flavor/template as shown in the example below.

clusterctl config cluster capi-eks-quickstart --flavor eks-managedmachinepool --kubernetes-version v1.17.3 --worker-machine-count=3 > capi-eks-quickstart.yaml

3. To use the newly created cluster, get the generated kubeconfig from the KinD cluster with the following command:

kubectl --namespace=default get secret managed-test-user-kubeconfig \
-o jsonpath={.data.value} | base64 --decode \
> managed-test.kubeconfig

4. By default the generated kubeconfig file uses aws-iam-authenticator. Assuming you have aws-iam-authenticator and kubectl installed you are ready to use your new EKS cluster:

kubectl --kubeconfig managed-test.kubeconfig get pods -A

Conclusion

The Cluster API (CAPI) is yet another way to provision and manage the lifecycle Kubernetes clusters across different environments. Its provider model allows for different constituencies to independently add support for managed Kubernetes services such as EKS. As we’ve seen CAPI is leveraging Kubernetes primitives, such as CRDs and reconciliation loops, to manage Kubernetes itself. With CAPI, you could conceivably use RBAC to restrict who can create clusters! And finally, given that EKS-Anywhere (EKS-A) is likely to use CAPI in the future, having knowledge of how CAPI works will help those who are planning to adopt EKS-A down the road.

Additional Resources

Jeremy Cowan is a Principal Container Specialist at AWS