AWS EKS Hybrid Nodes集成NVIDIA DGX部署边缘AI
来源: AWS 容器
Modern generative AI applications require deployment closer to where data is generated and business decisions are made, but this creates new infrastructure challenges. Organizations in manufacturing, healthcare, finance, and telecommunications need to deliver low-latency, energy-efficient AI workloads at the edge while maintaining data locality and regulatory compliance. However, managing Kubernetes on-premises adds operational complexity that can slow down innovation.
You can use Amazon Elastic Kubernetes Service (Amazon EKS) Hybrid Nodes to address this by joining on-premises infrastructure to the Amazon EKS control plane as remote nodes. This allows you to accelerate AI workload deployment with consistent operational practices, while addressing latency, compliance, and data residency requirements. EKS Hybrid Nodes removes the complexity and burden of self-managing Kubernetes on-premises so that your team can focus on deploying AI applications and driving innovations. It provides unified workflows and tooling alongside centralized monitoring and enhanced observability across your distributed infrastructure.
EKS Hybrid Nodes enables you to deliver AI capabilities wherever your business demands, such as the following use cases:
- Run low-latency services at on-premises locations, including real-time inference at the edge
- Train models with data that must remain on-premises to meet regulatory compliance requirements
- Deploy inference workloads near source data, such as Retrieval-Augmented Generation (RAG) applications using a local knowledge base
- Repurpose existing hardware investment
This post demonstrates a real-world example of integrating EKS Hybrid Nodes with NVIDIA DGX Spark, a compact and energy-efficient GPU platform optimized for edge AI deployment. In this post we walk you through deploying a large language model (LLM) for low-latency generative AI inference on-premises, setting up node monitoring and GPU observability with centralized management through Amazon EKS. Although this post uses DGX Spark, the architecture and patterns discussed apply to other NVIDIA DGX systems or GPU platforms.
Solution overview
For this demo walkthrough, you create an EKS cluster with EKS Hybrid Nodes enabled, and connect an on-premises DGX Spark as a hybrid node. You install the NVIDIA GPU Operator for Kubernetes to provision GPU resources for the local generative AI inference. Then, you deploy an LLM on the hybrid nodes using NVIDIA NIM, which are a set of microservices optimized by NVIDIA for accelerated model deployment. You also set up the Amazon EKS Node Monitoring Agent (NMA) to monitor node health and detect GPU-specific issues. Finally, you integrate the NVIDIA Data Center GPU Manager (DCGM) Exporter with Amazon Managed Service for Prometheus and Amazon Managed Grafana to provide GPU metrics observability across hybrid nodes.
The following diagram presents a high-level overview of the architecture of our solution.
Figure 1: Hybrid architecture for deploying GenAI workloads on-premises or at the edge using Amazon EKS Hybrid Nodes with NVIDIA DGX
EKS Hybrid Nodes requires private network connectivity between your on-premises or edge environment and the AWS Region. This connectivity can be established using either AWS Direct Connect or AWS Site-to-Site VPN into your Amazon Virtual Private Cloud (Amazon VPC). The node and pod Classless Inter-Domain Routing (CIDR) blocks for your hybrid nodes and container workloads must be unique and routable across your network environment. You provide these CIDRs as the RemoteNodeNetwork and RemotePodNetwork values when creating the EKS cluster with hybrid nodes.
This walkthrough doesn’t cover hybrid networking prerequisites for EKS Hybrid Nodes. Go to the Amazon EKS user guide for the details.
Prerequisites
The following prerequisites are necessary to complete this solution:
- Amazon VPC with two private and two public subnets, across two Availability Zones (AZs).
- An EKS cluster with hybrid nodes enabled. Follow the Amazon EKS user guide to deploy.
- On-premises compute nodes running a compatible operating system.
- Private connectivity between the on-premises network and Amazon VPC (through VPN or Direct Connect).
- Two routable RFC-1918 or CGNAT CIDR blocks for
RemoteNodeNetworkandRemotePodNetwork. - Configure the on-premises firewall and the EKS cluster security groups to allow bi-directional communications between the Amazon EKS control plane and remote node and pod CIDRs, as per the networking prerequisites.
- NVIDIA DGX (or other GPU-enabled) systems as hybrid nodes.
- NVIDIA NGC account and API key for accessing NIMs, see the NVIDIA documentation.
- The following tools:
Walkthrough
The following steps walk you through this solution.
Prepare EKS Hybrid Nodes
The following three sections walk you through preparations for EKS Hybrid Nodes.
Prepare IAM credentials
- Amazon EKS Hybrid Nodes use temporary AWS Identity and Access Management (IAM) credentials provisioned by AWS Systems Manager hybrid activations or IAM Roles Anywhere to authenticate with the EKS cluster. Follow the Amazon EKS user guide to create the required Hybrid Nodes IAM role (
AmazonEKSHybridNodesRole) using either one of the two options. - Create an Amazon EKS access entry with the Hybrid Nodes IAM role to enable your on-premises nodes to join the cluster. Go to Prepare cluster access for hybrid nodes in the Amazon EKS user guide for more details.
aws eks create-access-entry \
--cluster-name <CLUSTER_NAME> \
--principal-arn <HYBRID_NODES_ROLE_ARN> \
--type HYBRID_LINUX
Install nodeadm and join the DGX Spark as hybrid node
- Use EKS Hybrid Nodes CLI (nodeadm) to bootstrap and install all required components for your hybrid nodes to join the EKS cluster. This demo uses the ARM64 version of the nodeadm for the DGX Spark.
curl -OL 'https://hybrid-assets.eks.amazonaws.com/releases/latest/bin/linux/arm64/nodeadm'
chmod +x nodeadm
nodeadm install 1.34 --credential-provider ssm
- Prepare a
nodeConfig.yamlconfiguration file using the temporary IAM credentials generated in the previous section. The following is an example for using Systems Manager hybrid activations for hybrid nodes credentials.
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
cluster:
name: <CLUSTER_NAME>
region: <CLUSTER_REGION>
hybrid:
ssm:
activationCode: <SSM_ACTIVATION_CODE>
activationId: <SSM_ACTIVATION_ID>
- Run the
nodeadm initcommand with yournodeConfig.yamlto join your hybrid nodes to the EKS cluster.
nodeadm init --config-source file://nodeConfig.yaml
- For mixed GPU and non-GPU hybrid nodes, we recommend that you add a
--register-with-taints=nvidia.com/gpu=Exists:NoScheduletaint to GPU nodes to maximize GPU resource usage. Refer to the documentation regarding how to modify the kubelet configuration usingnodeadm.
Install Cilium Container Network Interface (CNI)
- Before running workloads on hybrid nodes, you must install a compatible CNI. For this example, we use Cilium because it’s the AWS-supported CNI for EKS Hybrid Nodes.
Create a Cilium configuration file: cilium-values.yaml.
# BGP Control Plane for LoadBalancer services
bgpControlPlane:
enabled: true
# NodePort services
nodePort:
enabled: true
# Node affinity - Run Cilium only on hybrid nodes
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: eks.amazonaws.com/compute-type
operator: In
values:
- hybrid
# IPAM configuration for pod networking
ipam:
mode: cluster-pool
operator:
clusterPoolIPv4PodCIDRList:
- 192.168.64.0/24 # RemotePodNetwork CIDR
clusterPoolIPv4MaskSize: 25
# Cilium Operator configuration
operator:
rollOutPods: true
unmanagedPodWatcher:
restart: false
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: eks.amazonaws.com/compute-type
operator: In
values:
- hybrid
- Install Cilium on EKS Hybrid Nodes using Helm with the preceding configuration.
helm repo add cilium https://helm.cilium.io/
CILIUM_VERSION=1.18.6
helm install cilium cilium/cilium \
--version ${CILIUM_VERSION} \
--values cilium-values.yaml \
--namespace kube-system
- If you’re running webhooks on hybrid nodes, then you must make sure that on-premises Pod CIDRs are routable across the hybrid network environment, using techniques such as BGP routing, static routing, or ARP proxying. This demo uses Cilium BGP control-plane to enable BGP peering between hybrid nodes and on-premises routers, and to advertise Pod CIDRs to the on-premises network.
Apply the following Cilium BGP configuration to your cluster.
---
apiVersion: cilium.io/v2
kind: CiliumBGPClusterConfig
metadata:
name: cilium-bgp
spec:
nodeSelector:
matchExpressions:
- key: eks.amazonaws.com/compute-type
operator: In
values:
- hybrid
bgpInstances:
- name: "cilium-bgp"
localASN: <NODES_ASN>
peers:
- name: "onprem-router"
peerASN: <ONPREM_ROUTER_ASN>
peerAddress: <ONPREM_ROUTER_IP>
peerConfigRef:
name: "cilium-peer"
---
apiVersion: cilium.io/v2
kind: CiliumBGPPeerConfig
metadata:
name: cilium-peer
spec:
timers:
holdTimeSeconds: 30
keepAliveTimeSeconds: 10
gracefulRestart:
enabled: true
restartTimeSeconds: 120
families:
- afi: ipv4
safi: unicast
advertisements:
matchLabels:
advertise: "bgp"
---
apiVersion: cilium.io/v2
kind: CiliumBGPAdvertisement
metadata:
name: bgp-adv-pod
labels:
advertise: bgp
spec:
advertisements:
- advertisementType: "PodCIDR"
- Validate that your nodes are connected to the EKS cluster and in a
Readystate.
$ kubectl get nodes -o wide -l eks.amazonaws.com/compute-type=hybrid
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
mi-0e06d30895cfcc155 Ready <none> 17d v1.34.2-eks-ecaa3a6 192.168.100.101 <none> Ubuntu 24.04.3 LTS 6.14.0-1015-nvidia containerd://2.2.1
Install NVIDIA GPU Operator for Kubernetes
The NVIDIA GPU Operator uses the Kubernetes operator framework to automate the lifecycle management of NVIDIA software components required to provision GPU resources. These components include the NVIDIA drivers (for enabling CUDA), Kubernetes device plugin for GPUs, the NVIDIA Container Toolkit, and DCGM based monitoring and others.
- Deploy NVIDIA GPU Operator on hybrid nodes using the official Helm chart.
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm repo update
helm install gpu-operator nvidia/gpu-operator \
--namespace gpu-operator \
--create-namespace \
--set driver.enabled=true \
--set toolkit.enabled=true \
--set devicePlugin.enabled=true \
--set gfd.enabled=true \
--set migManager.enabled=true \
--set nodeStatusExporter.enabled=true \
--set dcgmExporter.enabled=true \
--set operator.defaultRuntime=containerd \
--set operator.runtimeClass=nvidia \
--wait
- Wait until all pods in the
gpu-operatornamespace are running or completed.
$ kubectl get pods -n gpu-operator
NAMESPACE NAME READY STATUS RESTARTS AGE
gpu-operator gpu-feature-discovery-7jvph 1/1 Running 1 (2m39s ago) 15d
gpu-operator gpu-operator-7569f8b499-7k59n 1/1 Running 1 (2m39s ago) 27m
gpu-operator gpu-operator-node-feature-discovery-gc-55ffc49ccc-glq9l 1/1 Running 1 (2m39s ago) 27m
gpu-operator gpu-operator-node-feature-discovery-master-6b5787f695-n92x4 1/1 Running 1 (2m39s ago) 27m
gpu-operator gpu-operator-node-feature-discovery-worker-9wqq5 1/1 Running 1 (2m39s ago) 15d
gpu-operator nvidia-container-toolkit-daemonset-f9brm 1/1 Running 1 (2m39s ago) 15d
gpu-operator nvidia-cuda-validator-nzwmh 0/1 Completed 0 92s
gpu-operator nvidia-dcgm-exporter-hn4vz 1/1 Running 1 (2m39s ago) 15d
gpu-operator nvidia-device-plugin-daemonset-4kb5c 1/1 Running 1 (2m39s ago) 15d
gpu-operator nvidia-node-status-exporter-xpz9j 1/1 Running 1 (2m39s ago) 15d
gpu-operator nvidia-operator-validator-t662d 1/1 Running 1 (2m39s ago) 15d
- The NVIDIA GPU Operator validates the stack using the
nvidia-operator-validatorand thenvidia-cuda-validatorpods. Verify the logs on these pods and confirm that the validations are successful.
$ kubectl logs -n gpu-operator nvidia-operator-validator-t662d
Defaulted container "nvidia-operator-validator" out of: nvidia-operator-validator, driver-validation (init), toolkit-validation (init), cuda-validation (init), plugin-validation (init)
all validations are successful
$ kubectl logs -n gpu-operator nvidia-cuda-validator-nzwmh
Defaulted container "nvidia-cuda-validator" out of: nvidia-cuda-validator, cuda-validation (init)
cuda workload validation is successful
- The GPU within the DGX Spark node is now exposed to the kubelet and is visible in nodes allocatable:
$ kubectl get nodes "-o=custom-columns=NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu" -l eks.amazonaws.com/compute-type=hybrid
NAME GPU
mi-0e06d30895cfcc155 1
Deploy NVIDIA NIM for inference on EKS Hybrid Nodes
- To deploy NVIDIA NIM, you must set up an NVIDIA NGC API key and create container registry secrets using the key.
kubectl create secret docker-registry ngc-secret --docker-server=nvcr.io --docker-username='$oauthtoken' --docker-password=$NGC_API_KEY
kubectl create secret generic ngc-api --from-literal=NGC_API_KEY=$NGC_API_KEY
- Download the NIM Helm chart using the following command:
helm fetch https://helm.ngc.nvidia.com/nim/charts/nim-llm-<version_number>.tgz --username='$oauthtoken' --password=$NGC_API_KEY
cd nim-deploy/helm
- Select a supported model for NVIDIA NIM based on the GPU specification of your hybrid nodes. Create the helm charts overrides using the NIM container image path, and set the
ngcAPISecretandimagePullSecretsusing the secrets created in Step 1.
cat > qwen3-32b-spark-nim.values.yaml <<EOF
image:
repository: "nvcr.io/nim/qwen/qwen3-32b-dgx-spark"
tag: 1.0.0-variant
model:
ngcAPISecret: ngc-api
nodeSelector:
eks.amazonaws.com/compute-type: hybrid
resources:
limits:
nvidia.com/gpu: 1
persistence:
enabled: false
imagePullSecrets:
- name: ngc-secret
tolerations:
- key: "nvidia.com/gpu"
operator: "Exists"
effect: "NoSchedule"
EOF
- Deploy a NIM based LLM using the following command. In this example I’m running a Qwen3-32B image that is specifically optimized for the DGX Spark node.
helm install my-nim nim-llm-1.15.4.tgz -f ./qwen3-32b-spark-nim.values.yaml
This deployment isn’t persistent and doesn’t use a model cache. To implement a model cache, you need to install CSI drivers and configure Persistent Volumes using the on-premises storage infrastructure.
- The NIM pod deployed on hybrid nodes is routable through BGP, thus you can directly access its API to test the model.
$ kubectl get pods -o wide | grep nim
my-nim-nim-llm-0 1/1 Running 0 86m 192.168.64.102 mi-0e06d30895cfcc155 <none> <none>
$ curl -X 'POST' \
"http://192.168.64.102:8000/v1/chat/completions" \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"model": "Qwen/Qwen3-32B",
"prompt": "What is Kubernetes?",
"max_tokens": 100
}'
The following is an example of expected response:
{
"id": "cmpl-d5161978bda9401b9b7a4ef0a529b6ce",
"object": "text_completion",
"created": 1770465499,
"model": "Qwen/Qwen3-32B",
"choices": [
{
"index": 0,
"text": " Why do you need it?\n\nKubernetes is a container orchestration system that automates the deployment, scaling, and management of containerized applications. It is an open-source system that was originally developed by Google and is now maintained by the Cloud Native Computing Foundation (CNCF). Kubernetes allows developers to easily deploy and manage applications in a distributed environment, making it a popular choice for organizations that use containerized applications.\n\nOne of the main reasons why Kubernetes is needed is because it provides a way to manage container",
"logprobs": null,
"finish_reason": "length",
"stop_reason": null,
"prompt_logprobs": null
}
],
"service_tier": null,
"system_fingerprint": null,
"usage": {
"prompt_tokens": 4,
"total_tokens": 104,
"completion_tokens": 100,
"prompt_tokens_details": null
},
"kv_transfer_params": null
}
You have successfully deployed an LLM using NVIDIA NIM on your EKS Hybrid Nodes.
Configure centralized monitoring and observability for GPU metrics
The following two sections walk you through configuring centralized monitoring and observability for GPU metrics.
Install EKS Node Monitoring Agent
The EKS Node Monitoring Agent (NMA) is bundled into a container image that can be deployed as a DaemonSet across your EKS Hybrid Nodes. It collects node health information and detects GPU-specific issues using the NVIDIA DCGM and NVIDIA Management Library (NVML). It reports health issues by updating node status conditions and emitting Kubernetes events. Go to this AWS Container post to learn more details on NMA.
- To install the NMA on hybrid nodes, use the following AWS CLI command to create the Amazon EKS add-on.
aws eks create-addon --cluster-name <CLUSTER_NAME> --addon-name eks-node-monitoring-agent
- When it’s installed, NMA starts collecting custom node conditions for the EKS Hybrid Nodes. From the following example, you can see NMA detected the 200 GbE clustering interface (enp1s0f0np0) of the hybrid node is disconnected because I am only using a single DGX Spark.
kubectl describe node mi-0e06d30895cfcc155 | sed -n '/^Conditions:/,/^Addresses:/p' | head -n -1
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkingReady False Sat, 07 Feb 2026 23:52:59 +1100 Sat, 07 Feb 2026 05:22:59 +1100 InterfaceNotRunning Interface Name: "enp1s0f0np0", MAC: "4c:bb:47:2c:11:1d" is not up
KernelReady True Sat, 07 Feb 2026 05:12:28 +1100 Sat, 07 Feb 2026 05:12:28 +1100 KernelIsReady Monitoring for the Kernel system is active
AcceleratedHardwareReady True Sat, 07 Feb 2026 05:12:28 +1100 Sat, 07 Feb 2026 05:12:28 +1100 NvidiaAcceleratedHardwareIsReady Monitoring for the Nvidia AcceleratedHardware system is active
ContainerRuntimeReady True Sat, 07 Feb 2026 05:12:28 +1100 Sat, 07 Feb 2026 05:12:28 +1100 ContainerRuntimeIsReady Monitoring for the ContainerRuntime system is active
StorageReady True Sat, 07 Feb 2026 05:12:28 +1100 Sat, 07 Feb 2026 05:12:28 +1100 DiskIsReady Monitoring for the Disk system is active
[...]
- NMA also provides an automated log collection method through a Kubernetes CRD called
NodeDiagnostic. To enable the log collection from your hybrid nodes, create aNodeDiagnosticcustom resource on your cluster, and refer to the Amazon EKS user guide for more details.
apiVersion: eks.amazonaws.com/v1alpha1
kind: NodeDiagnostic
metadata:
name: <HYBRID_NODE_NAME>
spec:
logCapture:
destination: <S3_PRESIGNED_HTTP_PUT_URL>
Integrate NVIDIA DCGM Exporter with Amazon Managed Service for Prometheus and Amazon Managed Grafana
Beyond node health monitoring, you can use the NVIDIA DCGM Exporter (within the GPU Operator stack) to gather GPU performance metrics and telemetry data that can be scraped by Prometheus. This section shows how to integrate DCGM Exporter with Amazon Managed Service for Prometheus and Amazon Managed Grafana to enable enhanced GPU observability across your EKS Hybrid Nodes.
- Start by creating an Amazon Managed Service for Prometheus workspace.
aws amp create-workspace --alias dgx-spark-metrics --region ap-southeast-2 --query 'workspaceId' --output text
- Next, follow this user guide to create an IAM role that allows Prometheus to ingest the scraped GPU metrics from EKS Hybrid Nodes to the managed workspace. Verify that the role has the following permissions attached.
{
"Version":"2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"aps:RemoteWrite",
"aps:GetSeries",
"aps:GetLabels",
"aps:GetMetricMetadata"
],
"Resource": "*"
}
]
}
- Prepare a Prometheus installation Helm values file as the following example. Provide the Prometheus ingestion role Amazon Resource Name (ARN) from the last step, update the
remoteWriteendpoint path with the managed Prometheus workspace URL, and add the DCGM Exporter scrape configurations.
# RBAC permissions for service discovery
rbac:
create: true
serviceAccounts:
server:
name: amp-iamproxy-ingest-service-account
annotations:
eks.amazonaws.com/role-arn: <AMP-INGEST-ROLE-ARN>
server:
persistentVolume:
enabled: false
remoteWrite:
- url: https://<AWS-Managed-Prometheus-Workspace-URL>/api/v1/remote_write
sigv4:
region: <CLUSTER_REGION>
queue_config:
max_samples_per_send: 1000
max_shards: 200
capacity: 2500
global:
scrape_interval: 30s
external_labels:
cluster: <CLUSTER_NAME>
# Additional scrape configs for DCGM Exporter
serverFiles:
prometheus.yml:
scrape_configs:
# DCGM Exporter - GPU metrics
- job_name: 'dcgm-exporter'
kubernetes_sd_configs:
- role: endpoints # Auto-discover Kubernetes endpoints
namespaces:
names:
- gpu-operator # Look in gpu-operator namespace
relabel_configs:
- source_labels: [__meta_kubernetes_service_name]
regex: nvidia-dcgm-exporter # Match the DCGM exporter service
action: keep
- source_labels: [__meta_kubernetes_pod_node_name]
target_label: node # Add node label to metrics
- Use Helm to deploy Prometheus to hybrid nodes using the preceding values. Prometheus uses DCGM Exporter to scrape GPU performance metrics and remote write to the Amazon Managed Service for Prometheus workspace.
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add kube-state-metrics https://kubernetes.github.io/kube-state-metrics
helm repo update
kubectl create namespace prometheus
helm install prometheus prometheus-community/prometheus \
-n prometheus \
-f ./prometheus-amp-helm-values.yaml
- Follow this guide to create an Amazon Managed Grafana workspace, including the necessary permissions and authentication access through the IAM Identity Center. Then, configure the Grafana workspace to add Amazon Managed Service for Prometheus as a data source.
- Finally, create a new Grafana dashboard (or import one like this) to visualize scraped GPU metrics such as GPU utilization, GPU memory used, and GPU temperature and energy consumption.
Figure 2: Use Amazon Managed Grafana to monitor and visualize GPU metrics and telemetry across hybrid nodes
You can integrate EKS Hybrid Nodes with AWS cloud services to streamline generative AI deployment on-premises by removing the Kubernetes management overhead, while maintaining consistent operational practices with centralized observability across cloud, on-premises, and edge locations.
Cleaning up
To avoid incurring long-term charges, delete the AWS resources created as part of the demo walkthrough.
helm delete my-nim
helm delete prometheus -n prometheus
aws amp delete-workspace --workspace-id <AMP-WORKSPACE-ID> --region <AWS_REGION>
aws grafana delete-workspace --workspace-id <AMG-WORKSPACE-ID> --region <AWS_REGION>
eksctl delete cluster --name <CLUSTER_NAME> --region <CLUSTER_REGION>
Clean up other prerequisite resources that you created if they’re no longer needed.
Conclusion
This post provides a practical example of how Amazon EKS Hybrid Nodes empowers generative AI deployment using your own GPU nodes at on-premises and edge locations. Organizations can use EKS Hybrid Nodes to accelerate AI implementation with data locality and minimal latency, while maintaining consistent management and centralized observability across distributed environments.
To learn more about EKS Hybrid Nodes or running AI/ML workloads on Amazon EKS, explore the following resources:
- EKS Hybrid Nodes user guide
- AWS Blog: A deep dive into Amazon EKS Hybrid Nodes
- AWS re:Invent 2024 session (KUB205) – Bring the power of Amazon EKS to your on-premises applications
- AWS AI on EKS project