使用 virtbench 基准测试 KubeVirt 性能

来源: CNCF

原文

Organizations migrating VM estates from traditional hypervisors to KubeVirt often discover that many Kubernetes observability tools were originally designed around container workloads rather than VM-centric operational metrics. While KubeVirt schedules VMs as pods, the performance variables are fundamentally different—Kubernetes scheduler latency, CSI provisioner throughput, and SDN overlay overhead all interact in ways that standard kubectl metrics and pod-level monitoring do not surface.

Platform engineering teams need quantifiable, reproducible answers to questions that container benchmarks ignore:

  • Time-to-Ready: Wall-clock time from API call to confirmed guest OS network accessibility—not pod/Running.
  • Burst Capacity: Control plane and storage subsystem behavior under concurrent VM creation requests (boot storm).
  • Live Migration Stun Time: Precise network-level interruption window during VMI live migration over the overlay network.

To help measure these operational characteristics, we developed the KubeVirt Performance Benchmarking Toolkit (virtbench), an open-source CLI framework for executing reproducible stress tests across KubeVirt-enabled clusters, including KubeVirt on OpenShift and other environments using CSI-compatible storage

Benchmarking considerations for VM-based workloads

Standard Kubernetes observability tools can return a healthy status even when VM-class workloads are degraded. Three architectural mismatches explain why:

Pod readiness ≠ VM readiness. The Kubernetes Ready condition is satisfied when the container process starts—often in milliseconds. A KubeVirt VMI is not operationally ready until the guest kernel boots, user-space services initialize, and the guest agent reports a heartbeat. Benchmarks that stop the clock at pod/Running misrepresent actual time-to-ready by minutes in typical deployments. virtbench uses an in-cluster ssh-test-pod to continuously probe the VMI’s guest network stack; the measurement only completes on confirmed SSH reachability.

CSI provisioner load under multi-disk VMs. Production VMs commonly require multiple PVCs per instance—a boot volume, a swap volume, and one or more high-IOPS data volumes. Container-focused benchmarks do not exercise the CSI driver’s ability to concurrently provision and hot-attach multiple block devices to a single VMI. virtbench simulates these configurations, exercising the full DataVolume → PVC → block device attachment pipeline under load.

Overlay-network live migration vs. vMotion. vMotion transfers running memory state over a dedicated high-bandwidth TCP channel. KubeVirt live migration tunnels the same memory transfer through the cluster’s SDN overlay (e.g., OVN-Kubernetes), adding latency and competing with workload traffic. virtbench measures the stun time—the window during which the VMI’s network interface is unavailable—for both sequential and parallel migration scenarios.

How virtbench measures VM readiness

virtbench uses a client-side orchestration model with an in-cluster helper pod. The measurement pipeline works as follows:

  1. API Trigger: virtbench submits VirtualMachine objects to the Kubernetes API server, which creates the associated DataVolume and PVC resources via the CSI driver.
  2. State Machine Tracking: The CLI polls VMI status, tracking the full transition chain: Pending → Scheduled → Bound → Running.
  3. Guest Network Probing: An ssh-test-pod deployed inside the cluster issues continuous TCP probes to each VMI’s assigned IP address.
  4. Measurement Completion: The timer closes only on successful TCP handshake—confirming the guest network stack is fully operational.

Results are emitted as structured JSON and CSV, then rendered into an interactive HTML dashboard for analysis. The tool drives four internal benchmark engines—DataSource Clone, Migration, Capacity, and Failure Recovery—each implemented as a distinct module against the KubeVirt and storage APIs.

image of Figure 1: virtbench Architecture

Figure 1: virtbench Architecture

Benchmark scenarios included in virtbench

virtbench ships with six ready-made test scenarios:

ScenarioWhat It Tests
DataSource VM ProvisioningStorage cloning efficiency and volume creation times
Single Node Boot StormNode-level capacity under simultaneous VM power-on
Multi-Node Boot StormCluster-wide startup performance; simulates post-outage recovery
Live MigrationVM migration stun time; supports sequential and parallel runs
Chaos BenchmarkConcurrent chaos operations (create, resize, clone, restart, snapshot)
Failure and RecoveryHA validation via Fence Agents Remediation; measures time-to-recovery

Visualizing performance bottlenecks

Results are automatically rendered into a multi-chart HTML dashboard. 

A screenshot of the KubeVirt Performance Dashboard

Figure 2: Performance Dashboard

Each chart corresponds to a distinct measurement phase:

  • Creation Duration (blue): Per-VM provisioning latency under sequential load—establishes your storage and scheduler baseline.
  • Boot Storm (green): Provisioning latency under N-concurrent requests—exposes saturation inflection points in the CSI driver or etcd write throughput.
  • Live Migration (orange): Per-VMI migration duration and stun time across the node evacuation sequence—used to bound maintenance window SLAs.

A Creation Summary table decomposes end-to-end time into three sub-phases: clone_duration (CSI copy time), running_time (kubelet container start), and ping_time (guest network probe). This breakdown isolates whether a regression originates in the storage layer, the container runtime, or the guest OS init sequence.

Comparing different benchmarking approaches 

ToolFocusvirtbench Difference
kube-burnerAPI/control plane churn (etcd, scheduler)Measures the data path: clone speeds, OS boot times, network accessibility
fio / iperf wrappersRaw disk/network micro-benchmarksTests component interaction (e.g., network performance during live migration; clone speed during a boot storm)
KubeVirt E2E testsBinary pass/failQuantitative: “How long did the operation take?” Surfaces performance issues before production


Note:
A future release will include in-VM fio tooling for I/O benchmarking from inside the guest OS.

Running your first virtbench test

virtbench is designed to integrate into staging CI pipelines.  This means you can run it before and after infrastructure changes (storage array upgrades, CNI swaps, Kubernetes version bumps) to detect performance regressions before they reach production.

Benchmarking data is often highly environment-specific, making shared methodologies and reproducible testing approaches particularly valuable for the broader KubeVirt community.

Contributing and future roadmap

The project is available as open source and welcomes community feedback and contributions: github.com/portworx/kubevirt-benchmark.

Because infrastructure performance characteristics vary widely across storage platforms, CNIs, and hardware profiles, community feedback and benchmark contributions are especially valuable. The project welcomes reproducible test cases, platform comparisons, and additional benchmark modules from operators running KubeVirt in production environments.