最大化Amazon EKS效率:Auto Mode、Graviton与Spot实例协同工作
来源: AWS 容器
Amazon Elastic Kubernetes Service (Amazon EKS) Auto Mode streamlines the operation of your Amazon EKS clusters by automating key infrastructure components. This automation streamlines various operational tasks, allowing for more efficient resource allocation and management. By reducing the manual effort required to maintain the infrastructure, Amazon EKS Auto Mode enables teams to focus on higher-level strategic initiatives and application development.
While our previous blog covered the core concepts of Amazon EKS Auto Mode, this blog post dives deeper into optimizing Amazon EKS Auto Mode clusters using AWS Graviton and Amazon EC2 Spot instances. AWS customers adopt AWS Graviton instances to achieve up to 40% higher price-performance ratio and up to 60% less energy to meet their sustainability goals. Additionally, AWS customers use Amazon EC2 Spot instances for eligible workloads to save up to 90% on Amazon Elastic Compute Cloud (Amazon EC2) On-Demand costs.
Solution overview
We will cover AWS Graviton and Amazon EC2 Spot implementations on Amazon EKS Auto Mode through the following two scenarios:
- Deploy the retail store application (referenced in the previous blog) using exclusively AWS Graviton (ARM64) instances.
- Deploy the retail store application using a mix of Spot and On-Demand Amazon EC2 instances with the following considerations:
- Self-managed MySQL, a stateful application using persistent volumes, must run on On-Demand instances as it’s not suitable for Spot instances. While running a Relational Database Management System (RDBMS) in Kubernetes is not recommended, we’re using it here solely to demonstrate a stateful workload example.
- All other applications in the retail store application are eligible to run on Amazon EC2 Spot instances.
- All the applications in the retail store application can run on AMD64, ARM64, or a mix of both architectures.
Getting started
Follow these steps sequentially from the previous blog:
- Complete prerequisites
- Create cluster
- Deploy Ingress Class by applying
ingress.yaml(ALB Ingress for the retail store app) - Deploy EBS Storage Class by applying
ebs-sc.yaml(Persistent Volume claim for the self-managed MySQL RDBMS)
Steps for scenario 1: Adopting AWS Graviton instances:
- Create custom NodePool: After completing the “Common Steps” section, create a custom NodePool named
graviton-ondemandin your Amazon EKS Auto Mode cluster. While the predefinedsystemNodePool supports ARM64 architecture and could theoretically be used (with tolerations to the taints) to deploy applications on AWS Graviton based Amazon EC2 instances, this approach is not recommended. Instead, follow these best practices:- Reserve the
systemNodePool exclusively for:- Critical cluster add-ons
- System-level components
- Core infrastructure services
- Create dedicated NodePools for:
- Specific architecture requirements (like AWS Graviton’s ARM64)
- Different performance or scaling needs
- Prioritize Savings Plans and/or Reserved Instances
- This separation helps maintain:
- Clear operational boundaries
- Better resource allocation
- Improved capacity availability
- Easier maintenance and troubleshooting
- Reserve the
The graviton-ondemand NodePool will provide a dedicated environment for your exclusive ARM64-based workloads.
cat << EOF > graviton-ondemand.yaml
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: graviton-ondemand
spec:
template:
spec:
nodeClassRef:
group: eks.amazonaws.com
kind: NodeClass
name: default
expireAfter: 336h
terminationGracePeriod: 24h
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
- key: eks.amazonaws.com/instance-category
operator: In
values:
- c
- m
- r
- key: eks.amazonaws.com/instance-generation
operator: Gt
values:
- "6"
- key: kubernetes.io/arch
operator: In
values:
- arm64
- key: eks.amazonaws.com/instance-size
operator: NotIn
values: [nano, micro, small]
limits:
cpu: 1000
memory: 1000Gi
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 30s
budgets:
- nodes: 10%
weight: 10
EOF
kubectl apply -f graviton-ondemand.yaml
Two important attributes of the above NodePool that differentiate it from the general-purpose NodePool created during cluster provisioning are:
- ARM64 is the only supported architecture in this NodePool:
- key: kubernetes.io/arch
operator: In
values:
- arm64
- This NodePool has a weight of 10, whereas the
general-purposeandsystemNodePools provisioned during Amazon EKS Auto mode cluster creation had zero weight. This higher weight gives this NodePool priority over thegeneral-purposeandsystemNodePools.
weight: 10
- Deploy the retail store application using Helm. Create the
values.yamlfile to define the customization parameters for your Helm chart and execute the helm install command to deploy the application to your Amazon EKS cluster:
Create this values.yaml:
cat << EOF > values.yaml
catalog:
mysql:
secret:
create: true
name: catalog-db
username: catalog
persistentVolume:
enabled: true
accessMode:
- ReadWriteOnce
size: 30Gi
storageClass: eks-auto-ebs-csi-sc
ui:
endpoints:
catalog: http://retail-store-app-catalog:80
carts: http://retail-store-app-carts:80
checkout: http://retail-store-app-checkout:80
assets: http://retail-store-app-assets:80
autoscaling:
enabled: true
minReplicas: 5
maxReplicas: 10
targetCPUUtilizationPercentage: 50
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: ui
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: ui
ingress:
enabled: true
className: eks-auto-alb
annotations:
alb.ingress.kubernetes.io/healthcheck-path: /actuator/health
checkout:
endpoints:
orders: http://retail-store-app-orders:80
EOF
Execute Helm install command to deploy the application in Amazon EKS cluster:
helm install -f values.yaml retail-store-app oci://public.ecr.aws/aws-containers/retail-store-sample-chart --version 0.8.5
Wait a few minutes until all pods reach Running status:
kubectl get pods
NAME READY STATUS RESTARTS AGE
retail-store-app-assets-ff99f9c64-p95cx 1/1 Running 0 3m55s
retail-store-app-carts-6dc4cd6b79-btdj2 1/1 Running 0 3m55s
retail-store-app-carts-dynamodb-5958cf99cb-8dsh9 1/1 Running 0 3m55s
retail-store-app-catalog-5f44f6f487-6wrbx 1/1 Running 0 3m55s
retail-store-app-catalog-mysql-0 1/1 Running 0 3m55s
retail-store-app-checkout-8448fb4cff-298tt 1/1 Running 0 3m55s
retail-store-app-checkout-redis-6977ff5b75-mvxb4 1/1 Running 0 3m55s
retail-store-app-orders-58cddb8dfb-s5mhx 1/1 Running 0 3m55s
retail-store-app-orders-postgresql-0 1/1 Running 0 3m55s
retail-store-app-orders-rabbitmq-0 1/1 Running 0 3m55s
retail-store-app-ui-5c856459f-7n6xn 1/1 Running 0 3m55s
retail-store-app-ui-5c856459f-8frbb 1/1 Running 0 3m40s
retail-store-app-ui-5c856459f-95hpb 1/1 Running 0 3m40s
retail-store-app-ui-5c856459f-9sx87 1/1 Running 0 3m40s
retail-store-app-ui-5c856459f-kjwvt 1/1 Running 0 3m40s
Get the LoadBalancer URL:
kubectl get ingress retail-store-app-ui -o jsonpath="{.status.loadBalancer.ingress[*].hostname}"
Access the application using the URL: http://[result-from-above-command]/
Verify that all worker nodes are AWS Graviton instances:
kubectl get nodes -L kubernetes.io/arch -L node.kubernetes.io/instance-type -L karpenter.sh/capacity-type
NAME STATUS ROLES AGE VERSION ARCH INSTANCE-TYPE CAPACITY-TYPE
i-0a47a83dc76f4e2a9 Ready <none> 4m19s v1.31.12-eks-e386d34 arm64 c7g.large on-demand
i-0a7b99c1942edfc46 Ready <none> 4m11s v1.31.12-eks-e386d34 arm64 c7g.large on-demand
Reset Amazon EKS Auto Mode cluster before proceeding to scenario 2
Execute the below commands sequentially to delete the retail-store-app suite and remove the PVCs (Persistent Volume Claims) associated with the stateful sets, which will also delete the underlying Amazon Elastic Block Store (Amazon EBS) volumes. This returns the Amazon EKS Auto Mode cluster to its original state before proceeding to scenario 2.
helm uninstall retail-store-app
kubectl delete pvc/data-retail-store-app-catalog-mysql-0
Before proceeding to scenario 2, let’s recap what we accomplished in scenario 1.
We created a new node pool exclusively for AWS Graviton (ARM64) and assigned it a weight of 10. This will make sure that any eligible workloads deployed in this Amazon EKS cluster will prioritize AWS Graviton instances first for better price-performance. If a suitable AWS Graviton node cannot be found, the workload will fall back to the x86_64 options defined in the default “general-purpose” node pool.
Steps for scenario 2: Adopting spot instances and handling workload restrictions:
Create a custom node pool named spot in the Amazon EKS Auto Mode cluster.
cat << EOF > spot.yaml
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: spot
spec:
template:
spec:
nodeClassRef:
group: eks.amazonaws.com
kind: NodeClass
name: default
expireAfter: 336h
terminationGracePeriod: 24h
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
- key: eks.amazonaws.com/instance-category
operator: NotIn
values:
- t
- key: eks.amazonaws.com/instance-generation
operator: Gt
values:
- "4"
- key: kubernetes.io/arch
operator: In
values:
- arm64
- amd64
- key: eks.amazonaws.com/instance-size
operator: NotIn
values: [nano, micro, small]
limits:
cpu: 1000
memory: 1000Gi
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 30s
budgets:
- nodes: 10%
weight: 20
EOF
kubectl apply -f spot.yaml
Key features of this NodePool:
- Spot Pricing Options Support:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
- Multi-Architecture Support:
- key: kubernetes.io/arch
operator: In
values:
- arm64
- amd64
- Priority Weighting: The NodePool has a weight of 20, giving it priority over the “general-purpose” (0 weight), “system”(0 weight), and “graviton-ondemand”(10 weight) NodePools .
weight: 20
- Deploy the retail store app using Helm. Create the
values.yamlfile to define the customization parameters for your Helm, and then execute the helm install command to deploy the application to your Amazon EKS cluster.
Create the values_spot.yaml creation with customization parameters.
cat << EOF > values_spot.yaml
catalog:
mysql:
nodeSelector:
karpenter.sh/capacity-type: on-demand
secret:
create: true
name: catalog-db
username: catalog
persistentVolume:
enabled: true
accessMode:
- ReadWriteOnce
size: 30Gi
storageClass: eks-auto-ebs-csi-sc
ui:
endpoints:
catalog: http://retail-store-app-catalog:80
carts: http://retail-store-app-carts:80
checkout: http://retail-store-app-checkout:80
assets: http://retail-store-app-assets:80
autoscaling:
enabled: true
minReplicas: 5
maxReplicas: 50
targetCPUUtilizationPercentage: 80
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: ui
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: ui
ingress:
enabled: true
className: eks-auto-alb
annotations:
alb.ingress.kubernetes.io/healthcheck-path: /actuator/health
checkout:
endpoints:
orders: http://retail-store-app-orders:80
EOF
Execute the Helm install command to deploy the application to Amazon EKS.
helm install -f values_spot.yaml retail-store-app oci://public.ecr.aws/aws-containers/retail-store-sample-chart --version 0.8.5
Key configuration aspects:
We will look into the key configuration aspects in the values_spot.yaml .
- MySQL Workload Restriction: Restricts stateful MySQL workloads to “on-demand” instances only:
catalog:
mysql:
nodeSelector:
karpenter.sh/capacity-type: on-demand
Verification steps:
Confirm that everything is in Running state. This step takes a couple of minutes.
kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
retail-store-app-assets-78d4fd49cf-w67cs 1/1 Running 0 69s 192.168.112.224 i-071f5f98fa309963e <none> <none>
retail-store-app-carts-947659c4-zd2sc 1/1 Running 0 33m 192.168.187.65 i-00222cd2a51981fee <none> <none>
retail-store-app-carts-dynamodb-58f675c5c8-k7tfp 1/1 Running 0 33m 192.168.187.67 i-00222cd2a51981fee <none> <none>
retail-store-app-catalog-56bd6bbbd-bpzk6 1/1 Running 0 69s 192.168.112.226 i-071f5f98fa309963e <none> <none>
retail-store-app-catalog-mysql-0 1/1 Running 0 33m 192.168.187.71 i-00222cd2a51981fee <none> <none>
retail-store-app-checkout-696f448554-8lwl8 1/1 Running 0 33m 192.168.187.64 i-00222cd2a51981fee <none> <none>
retail-store-app-checkout-redis-6f5947f4d8-7sbb6 1/1 Running 0 33m 192.168.187.68 i-00222cd2a51981fee <none> <none>
retail-store-app-orders-979cb5b4c-zrxtf 1/1 Running 0 69s 192.168.112.225 i-071f5f98fa309963e <none> <none>
retail-store-app-orders-postgresql-0 1/1 Running 0 68s 192.168.112.228 i-071f5f98fa309963e <none> <none>
retail-store-app-orders-rabbitmq-0 1/1 Running 0 33m 192.168.187.66 i-00222cd2a51981fee <none> <none>
retail-store-app-ui-79d8cf795b-sqvj9 1/1 Running 0 33m 192.168.187.70 i-00222cd2a51981fee <none> <none>
retail-store-app-ui-79d8cf795b-tmqhw 1/1 Running 0 21m 192.168.187.73 i-00222cd2a51981fee <none> <none>
retail-store-app-ui-79d8cf795b-vnpds 1/1 Running 0 33m 192.168.187.69 i-00222cd2a51981fee <none> <none>
retail-store-app-ui-79d8cf795b-zhtkj 1/1 Running 0 21m 192.168.187.72 i-00222cd2a51981fee <none> <none>
retail-store-app-ui-79d8cf795b-zhwfv 1/1 Running 0 69s 192.168.112.227 i-071f5f98fa309963e <none> <none>
Get the LoadBalancer URL: kubectl get ingress retail-store-app-ui -o jsonpath="{.status.loadBalancer.ingress[*].hostname}"
Access the application using the URL: http://[result-from-above-command]/
Verify the node configuration.
k get nodes -L kubernetes.io/arch -L node.kubernetes.io/instance-type -L karpenter.sh/capacity-type
NAME STATUS ROLES AGE VERSION ARCH INSTANCE-TYPE CAPACITY-TYPE
i-00222cd2a51981fee Ready <none> 33m v1.31.12-eks-e386d34 arm64 c7g.large on-demand
i-071f5f98fa309963e Ready <none> 94s v1.31.12-eks-e386d34 arm64 c8gn.large spot
Verifying workload placement
- MySQL stateful application verification: Verify that the MySQL stateful application is running on “on-demand” capacity type as specified in
values_spot.yaml:
kubectl get pods -l=app.kubernetes.io/component=mysql -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
retail-store-app-catalog-mysql-0 1/1 Running 0 34m 192.168.187.71 i-00222cd2a51981fee <none> <none>
- UI application distribution: You may notice that the UI application pods are distributed across mixed architectures (AMD64 andARM64) and capacity types (on-demand and spot). Unlike Amazon DynamoDB Local or MySQL, UI pods have no specific restrictions, demonstrating how Amazon EKS Auto Mode efficiently bin-packs pods across available nodes according to scheduling rules:
Note: If you don’t see UI pods having the OS architecture diversification, try to increase the number of pods with the below command:
kubectl scale deployments.apps retail-store-app-ui --replicas 70
This step still doesn’t guarantee that you will get a mixed architecture. Amazon EKS Auto Mode always provisions cost-efficient nodes based on your node pool configuration.
kubectl get pods -l=app.kubernetes.io/name=ui -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
retail-store-app-ui-79d8cf795b-sqvj9 1/1 Running 0 35m 192.168.187.70 i-00222cd2a51981fee <none> <none>
retail-store-app-ui-79d8cf795b-tmqhw 1/1 Running 0 23m 192.168.187.73 i-00222cd2a51981fee <none> <none>
retail-store-app-ui-79d8cf795b-vnpds 1/1 Running 0 35m 192.168.187.69 i-00222cd2a51981fee <none> <none>
retail-store-app-ui-79d8cf795b-zhtkj 1/1 Running 0 23m 192.168.187.72 i-00222cd2a51981fee <none> <none>
retail-store-app-ui-79d8cf795b-zhwfv 1/1 Running 0 3m21s 192.168.112.227 i-071f5f98fa309963e <none> <none>
In scenario 2, we created a node pool exclusively for Amazon EC2 Spot instances that supports both AMD64 and ARM64 CPU architectures. We assigned it a weight of 20, to make sure that stateless workloads are first scheduled on worker nodes using Spot pricing. Only when Spot capacity is unavailable do workloads fall back to the ‘graviton-ondemand’ node pool, followed by the ‘general-purpose’ node pool.
Cleaning up
To avoid incurring future charges run the below steps sequentially to clean up the resources that were used for this blog post:
helm uninstall retail-store-app
kubectl delete pvc/data-retail-store-app-catalog-mysql-0
eksctl delete cluster --name eks-auto-mode-demo
Conclusion
Although AWS Graviton Compute and Spot pricing options are not the default choices defined in the “general-purpose” NodePool, customers can seamlessly add their own Custom NodePool by using one of the strategies outlined in this blog. By incorporating AWS Graviton and Amazon EC2 Spot, customers can achieve improved compute efficiency and cost optimization. Additionally, Amazon EKS Auto Mode has a built-in Spot Interruption handler and node consolidation mechanism to further enhance these optimizations.
For more information on Amazon EKS Auto Mode capabilities, visit the Amazon EKS documentation.
About the authors
Muru Bhaskaran is a Senior Solutions Architect at AWS, specializing in Graviton adoption and migration strategies. He partners with customers to achieve maximum compute optimization, delivering enhanced performance and cost savings through ARM64-based EC2 instances. A dedicated expert in EC2, EC2 Spot Instances, Amazon EKS, EKS Auto Mode, and Karpenter, Muru guides customers in harnessing these powerful AWS technologies to run their containerized workloads with greater efficiency and scale.
Zakiya Randall is a Senior Technical Account Manager at AWS and she helps customers optimize their cloud infrastructure. She specializes in containers, observability, and modernization. Outside of work, she loves to play golf and visit museums.