Provisioning an EKS cluster with CAPI
In an earlier post I described how to use KinD to run EKS-Distro on your local machine. In this post, I will explain how to use that cluster to bootstrap an EKS cluster in the AWS cloud using the Cluster API (CAPI) and a few things I learned along the way.
Why you should care about the Cluster API
At this point, you may be thinking to yourself, “Why do I need another way to provision and manage the lifecycle of an EKS cluster? I can already use eksctl, Terraform, CloudFormation, Pulumi, the Cloud Development Kit (CDK), and so on.” It really boils down to consistency. The Cluster API provides a consistent, declarative way to deploy and manage Kubernetes clusters across a variety of different environments. This is largely possible because the Cluster API provider establishes a common set schemas (CRDs) and a controller framework that are applicable across providers. Furthermore, EKS Anywhere will likely leverage CAPI to bootstrap clusters into VMware and bare metal environments. And having a consistent, repeatable way to deploy and manage clusters across these different environments will ultimately help simplify operations. For example, imagine using GitOps for cluster lifecycle management in addition to configuration.
Getting started
In October 2020, Weaveworks published a blog that walked through how to create an EKS cluster using the Cluster API Provider for AWS (CAPA). The steps have largely stayed the same with a couple of minor exceptions which I will describe henceforth. I am providing the instructions here simply for convenience. If you want additional information about each step, please read the Weaveworks blog.
- Dowload and install
clusterctl
andclusterawsadm
. You can installclusterctl
by following these instructions. You can install the latest release ofclusterawsadm
from GitHub. - Create a KinD cluster. See my previous post on running EKS-D with KinD.
- Create a file called
eks.config
with the following contents:
apiVersion: bootstrap.aws.infrastructure.cluster.x-k8s.io/v1alpha1
kind: AWSIAMConfiguration
spec:
bootstrapUser:
enable: true
eks:
enable: true
iamRoleCreation: true # Set to true if you plan to use the EKSEnableIAM feature flag to enable automatic creation of IAM roles
defaultControlPlaneRole:
disable: false # Set to false to enable creation of the default control plane role
Although I’m leaving the defaults in-place, you are free to tailor these for your own environment.
4. Set the environment variables for your environment and use the eks.config
file to create the required IAM resources in AWS:
export AWS_REGION=us-east-2 # This is used to help encode your environment variables
export AWS_ACCESS_KEY_ID=<access-key-for-bootstrap-user>
export AWS_SECRET_ACCESS_KEY=<secret-access-key-for-bootstrap-user>
export AWS_SESSION_TOKEN=<session-token> # If you are using Multi-Factor Auth.
5. Prepare the environment for CAPA by running clusterawsadm
clusterawsadm bootstrap iam create-cloudformation-stack --config eks.config
This creates an IAM user called bootstrapper.cluster-api-provider-aws.sigs.k8s.io
. The user belongs to an IAM group with the prefix cluster-api-provider
. Attached to the group is a policy that grants the controller the ability to provision resources in your AWS account. The policy for the group appears below:
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"ec2:AllocateAddress",
"ec2:AssociateRouteTable",
"ec2:AttachInternetGateway",
"ec2:AuthorizeSecurityGroupIngress",
"ec2:CreateInternetGateway",
"ec2:CreateNatGateway",
"ec2:CreateRoute",
"ec2:CreateRouteTable",
"ec2:CreateSecurityGroup",
"ec2:CreateSubnet",
"ec2:CreateTags",
"ec2:CreateVpc",
"ec2:ModifyVpcAttribute",
"ec2:DeleteInternetGateway",
"ec2:DeleteNatGateway",
"ec2:DeleteRouteTable",
"ec2:DeleteSecurityGroup",
"ec2:DeleteSubnet",
"ec2:DeleteTags",
"ec2:DeleteVpc",
"ec2:DescribeAccountAttributes",
"ec2:DescribeAddresses",
"ec2:DescribeAvailabilityZones",
"ec2:DescribeInstances",
"ec2:DescribeInternetGateways",
"ec2:DescribeImages",
"ec2:DescribeNatGateways",
"ec2:DescribeNetworkInterfaces",
"ec2:DescribeNetworkInterfaceAttribute",
"ec2:DescribeRouteTables",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSubnets",
"ec2:DescribeVpcs",
"ec2:DescribeVpcAttribute",
"ec2:DescribeVolumes",
"ec2:DetachInternetGateway",
"ec2:DisassociateRouteTable",
"ec2:DisassociateAddress",
"ec2:ModifyInstanceAttribute",
"ec2:ModifyNetworkInterfaceAttribute",
"ec2:ModifySubnetAttribute",
"ec2:ReleaseAddress",
"ec2:RevokeSecurityGroupIngress",
"ec2:RunInstances",
"ec2:TerminateInstances",
"tag:GetResources",
"elasticloadbalancing:AddTags",
"elasticloadbalancing:CreateLoadBalancer",
"elasticloadbalancing:ConfigureHealthCheck",
"elasticloadbalancing:DeleteLoadBalancer",
"elasticloadbalancing:DescribeLoadBalancers",
"elasticloadbalancing:DescribeLoadBalancerAttributes",
"elasticloadbalancing:DescribeTags",
"elasticloadbalancing:ModifyLoadBalancerAttributes",
"elasticloadbalancing:RegisterInstancesWithLoadBalancer",
"elasticloadbalancing:DeregisterInstancesFromLoadBalancer",
"elasticloadbalancing:RemoveTags"
],
"Resource": [
"*"
],
"Effect": "Allow"
},
{
"Condition": {
"StringLike": {
"iam:AWSServiceName": "elasticloadbalancing.amazonaws.com"
}
},
"Action": [
"iam:CreateServiceLinkedRole"
],
"Resource": [
"arn:*:iam::*:role/aws-service-role/elasticloadbalancing.amazonaws.com/AWSServiceRoleForElasticLoadBalancing"
],
"Effect": "Allow"
},
{
"Condition": {
"StringLike": {
"iam:AWSServiceName": "spot.amazonaws.com"
}
},
"Action": [
"iam:CreateServiceLinkedRole"
],
"Resource": [
"arn:*:iam::*:role/aws-service-role/spot.amazonaws.com/AWSServiceRoleForEC2Spot"
],
"Effect": "Allow"
},
{
"Action": [
"iam:PassRole"
],
"Resource": [
"arn:*:iam::*:role/*.cluster-api-provider-aws.sigs.k8s.io"
],
"Effect": "Allow"
},
{
"Action": [
"secretsmanager:CreateSecret",
"secretsmanager:DeleteSecret",
"secretsmanager:TagResource"
],
"Resource": [
"arn:*:secretsmanager:*:*:secret:aws.cluster.x-k8s.io/*"
],
"Effect": "Allow"
},
{
"Action": [
"ssm:GetParameter"
],
"Resource": [
"arn:aws:ssm:*:*:parameter/aws/service/eks/optimized-ami/*"
],
"Effect": "Allow"
},
{
"Action": [
"iam:GetRole",
"iam:ListAttachedRolePolicies"
],
"Resource": [
"arn:aws:iam::*:role/*"
],
"Effect": "Allow"
},
{
"Action": [
"iam:GetPolicy"
],
"Resource": [
"arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
],
"Effect": "Allow"
},
{
"Action": [
"eks:DescribeCluster",
"eks:ListClusters",
"eks:CreateCluster",
"eks:TagResource",
"eks:UpdateClusterVersion",
"eks:DeleteCluster",
"eks:UpdateClusterConfig",
"eks:UntagResource"
],
"Resource": [
"arn:aws:eks:*:*:cluster/*"
],
"Effect": "Allow"
},
{
"Condition": {
"StringEquals": {
"iam:PassedToService": "eks.amazonaws.com"
}
},
"Action": [
"iam:PassRole"
],
"Resource": [
"*"
],
"Effect": "Allow"
}
]
}
clusterawsadm
also creates a few other IAM roles that get assigned to worker nodes and the EKS control plane later. You can see the complete list of resources clusterawsadm
creates by examining the cluster-api-provider-aws-sigs-k8s-io
stack in the CloudFormation console or by describing the stack.
aws cloudformation describe-stack-resources --stack-name cluster-api-provider-aws-sigs-k8s-io --region us-east-2 --output table
6. Export your AWS credentials to an environment variable.
export AWS_B64ENCODED_CREDENTIALS=$(clusterawsadm bootstrap credentials encode-as-profile)
Enabling CAPA
Run the following commands to install the Cluster API Provider for AWS with EKS support:
export EXP_EKS=true
export EXP_EKS_IAM=true
export EXP_EKS_ADD_ROLES=trueclusterctl init -b kubeadm:v0.3.19 -c kubeadm:v0.3.19 --core cluster-api:v0.3.19 --infrastructure=aws
This will provision a whole slew of resources into your KinD cluster including the set of Providers as seen below:
$ kubectl get providers -A
NAMESPACE NAME TYPE PROVIDER VERSION WATCH NAMESPACE
capa-eks-bootstrap-system bootstrap-aws-eks BootstrapProvider aws-eks v0.6.6
capa-eks-control-plane-system control-plane-aws-eks ControlPlaneProvider aws-eks v0.6.6
capa-system infrastructure-aws InfrastructureProvider aws v0.6.6
capi-kubeadm-bootstrap-system bootstrap-kubeadm BootstrapProvider kubeadm v0.3.19
capi-system cluster-api CoreProvider cluster-api v0.3.19
Currently clusterctl init
gets the latest release by default for each provider and then validates the provider contract using the metadata.yaml file. However, this causes init
to fail when using an older versions of clusterctl
, e.g. v0.3.19, and a newer version is available with a different contract, e.g. v0.4.0. Consequently, you should always specify the version of kubeadm
to use for the bootstrap, control plane and core providers.
Next, you’ll need to update capa-eks-control-plane-system-capa-eks-control-plane-manager-role
cluster role. There is currently a bug in clusterctl init
where awsclustercontrolleridentities
is missing from the ClusterRole RBAC policy. Fixing this involves adding awsclustercontrolleridentities
to infrastructure.cluster.x-k8s.io
api group.
Create a file called control-plane-manager-role.yaml
with the following contents:
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: capa-eks-control-plane-system-capa-eks-control-plane-manager-role
labels:
cluster.x-k8s.io/provider: control-plane-aws-eks
clusterctl.cluster.x-k8s.io: ''
rules:
- verbs:
- create
- delete
- get
- list
- patch
- update
- watch
apiGroups:
- ''
resources:
- secrets
- verbs:
- get
- list
- watch
apiGroups:
- cluster.x-k8s.io
resources:
- clusters
- clusters/status
- verbs:
- create
- delete
- get
- list
- patch
- update
- watch
apiGroups:
- controlplane.cluster.x-k8s.io
resources:
- awsmanagedcontrolplanes
- verbs:
- get
- patch
- update
apiGroups:
- controlplane.cluster.x-k8s.io
resources:
- awsmanagedcontrolplanes/status
- verbs:
- create
- get
- list
- patch
- watch
apiGroups:
- ''
resources:
- events
- verbs:
- get
- list
- watch
apiGroups:
- infrastructure.cluster.x-k8s.io
resources:
- awsclustercontrolleridentities
- awsclusterroleidentities
- awsclusterstaticidentities
- verbs:
- get
- list
- watch
apiGroups:
- infrastructure.cluster.x-k8s.io
resources:
- awsmachinepools
- awsmachinepools/status
- verbs:
- get
- list
- watch
apiGroups:
- infrastructure.cluster.x-k8s.io
resources:
- awsmachines
- awsmachines/status
- verbs:
- get
- list
- watch
apiGroups:
- infrastructure.cluster.x-k8s.io
resources:
- awsmanagedclusters
- awsmanagedclusters/status
- verbs:
- get
- list
- watch
apiGroups:
- infrastructure.cluster.x-k8s.io
resources:
- awsmanagedmachinepools
- awsmanagedmachinepools/status
Then apply the updated ClusterRole to your cluster by running the following command:
kubectl apply -f control-plane-manager-role.yaml
Creating the EKS cluster
- Run the following to generate the yaml for the eks flavor. Ensure that you set the environment variables accordingly:
export AWS_REGION=us-east-1
export AWS_SSH_KEY_NAME=default
export KUBERNETES_VERSION=v1.17.0
export WORKER_MACHINE_COUNT=1
export AWS_NODE_MACHINE_TYPE=t2.medium
clusterctl config cluster managed-test --flavor eks > capi-eks.yaml
Setting the flavor flag to eks
instructs clusterctl
to use the cluster-template-eks.yaml
template. The link to cluster-template-eks.yaml
is broken in the Weaveworks blog. Fortunately, a copy of it is still available in GitHub. Running the clusterctl
command substitutes the placeholder values in the template with the values assigned in the export statements shown above. If you want to add to or modify the template, the schema for the EKS control plane can be found in the CAPA book. Inspect the resulting yaml file to verify that all of the placeholders have been populated before continuing to the next step.
2. Apply the capi-eks.yaml
to your KinD cluster:
kubectl apply -f capi-eks.yaml
The AWS provider for the Cluster API will, by default, create a VPC that spans multiple AZs with public and private subnets. It will also create NAT gateways for each private subnet, configure the necessary route tables, etc. Once the infrastructure is ready, it provisions an EKS cluster with 1 t2.medium instance. The nice thing about CAPI is that it runs as a series of control loops. that are calling the AWS APIs instead of CloudFormation. These loops should run until the current state matches the desired state. I actually witnessed this myself when I initially tried creating a cluster. At first there weren’t enough available EIPs in the region. After releasing a few of them, the controller finished reconciling the changes and the cluster and worker node were provisioned. If I had used CloudFormation, the stack creation would have failed and all the changes would have been automatically rolled back. Ordinarily, I’d be concerned about calling the AWS APIs directly, however, the controller implements exponential backoff/retry logic when the API calls get throttled. You can see this in the WaitForWithRetryable
function and in other places in the code.
Incidentally, you can see the various actions being performed upon creation of a managed control plane by looking at the reconcileNormal
method in awsmanagedcontrolplane_controller.go
.
If you want to create a cluster with a managed node group, use the eks-machinepool flavor/template as shown in the example below.
clusterctl config cluster capi-eks-quickstart --flavor eks-managedmachinepool --kubernetes-version v1.17.3 --worker-machine-count=3 > capi-eks-quickstart.yaml
3. To use the newly created cluster, get the generated kubeconfig from the KinD cluster with the following command:
kubectl --namespace=default get secret managed-test-user-kubeconfig \
-o jsonpath={.data.value} | base64 --decode \
> managed-test.kubeconfig
4. By default the generated kubeconfig file uses aws-iam-authenticator. Assuming you have aws-iam-authenticator and kubectl installed you are ready to use your new EKS cluster:
kubectl --kubeconfig managed-test.kubeconfig get pods -A
Conclusion
The Cluster API (CAPI) is yet another way to provision and manage the lifecycle Kubernetes clusters across different environments. Its provider model allows for different constituencies to independently add support for managed Kubernetes services such as EKS. As we’ve seen CAPI is leveraging Kubernetes primitives, such as CRDs and reconciliation loops, to manage Kubernetes itself. With CAPI, you could conceivably use RBAC to restrict who can create clusters! And finally, given that EKS-Anywhere (EKS-A) is likely to use CAPI in the future, having knowledge of how CAPI works will help those who are planning to adopt EKS-A down the road.