使用 Kairos、k0rdent 和 bindy 从零开始构建云原生平台
来源: CNCF
As we shared in our earlier post on FluxCD, RBC Capital Markets has been on a deliberate journey to modernize our Kubernetes platform. GitOps with FluxCD gave us a solid deployment foundation. But as our platform grew, today we operate over 50 clusters spanning on-premises VMware environments and multiple clouds, we hit a set of problems that no single off-the-shelf tool was designed to solve together: How do you manage the lifecycle of the clusters themselves? How do you ensure every node is reproducible and tamper-evident at boot? And how do you integrate Kubernetes service discovery with enterprise DNS infrastructure without every record change going through a ticket queue?
This post is about the several projects that answered those questions for us, and what we learned building with them inside a regulated financial institution.
The challenge: Platform engineering at scale in a regulated environment
Managing 50+ Kubernetes clusters across hybrid infrastructure is not just an operational challenge, in capital markets it is also a compliance challenge. SOX, PCI-DSS, and Basel III create real requirements around auditability, configuration drift prevention, and network segmentation. Our platform teams cannot afford to have snowflake nodes, undocumented cluster state, or manual DNS records that accumulate over years.
When we stepped back and looked at what we were spending engineering effort on, three gaps stood out:
- Node configuration drift: VM-based nodes that had been patched and mutated over time were becoming impossible to reason about.
- Cluster provisioning: spinning up new clusters for trading desks or risk teams was a multi-day manual exercise with no single source of truth.
- DNS integration: every new service or ingress endpoint required a manual ticket to our network team, creating a bottleneck and an audit trail that lived outside our GitOps workflow.
We decided to solve each of these from the ground up, using cloud-native projects where they existed and building our own where they did not.
Kairos: Immutable OS for nodes you can trust
The first piece of the puzzle was node immutability. We evaluated several approaches, but Kairos, a CNCF Sandbox project, aligned most directly with what we needed: a Linux distribution designed from first principles to be immutable, declaratively configured, and reproducible.
With Kairos, every node in our fleet boots from an OCI image. That image is built from a known base (in our case RHEL-derived), baked with our approved security configuration, and published to our internal registry. The cloud-config model lets us define node behaviour, SSH keys, network configuration, SSSD authentication against our Active Directory, Kubernetes agent registration, all as versioned YAML that flows through FluxCD just like any other platform component.
A CI/CD pipeline for operating system images
One of the less-discussed challenges of immutable infrastructure is the discipline it demands around image build and validation. We treat our Kairos images exactly like application container images: every change triggers a GitHub Actions pipeline that builds the image, runs integration tests against a live VM, and publishes a new OCI tag only on a clean pass. Nightly builds catch upstream regressions in base packages or the Kairos framework itself before they reach production.
This means our node image pipeline has the same properties we expect from application CI:
- Every commit is tested end-to-end, not just linted or statically analysed.
- Nightly runs validate that the current pinned base image and package set still produces a bootable, correctly configured node.
- OCI tags are immutable artefacts. A tag that passed integration tests is never modified; rollback is a matter of pointing to a prior tag.
Kubernetes-native VM provisioning with VirtRigaud
The other half of the VMware story is how we actually provision VMs from our Kairos images. Rather than reaching for imperative vSphere tooling, we use VirtRigaud, a Kubernetes operator that provides declarative VM management across multiple hypervisors (vSphere, Libvirt/KVM, and Proxmox) through a unified CRD API.
The model is straightforward: our Kairos-built OCI image is registered as a VMImage CRD, and VMs are expressed as VirtualMachine CRDs referencing that image. FluxCD reconciles these manifests like any other platform resource. The result is that provisioning a new Kairos node on vSphere is semantically identical to deploying a workload, it is a pull request, reviewed, merged, and reconciled automatically.
VirtRigaud’s remote provider architecture also fits our security requirements well: provider credentials are isolated to their own pods, and the controller communicates with them over gRPC/TLS rather than embedding hypervisor credentials centrally.
The operational shift this created was significant:
- Drift is eliminated by design. There is no apt or yum running on production nodes. If a configuration change is needed, a new image is built, integration-tested, and nodes are rolled.
- Audit trails become trivial. Because every node’s configuration is an OCI digest in a registry and every VM is a versioned CRD in Git, we can answer “what was running on that node on that date?” with precision.
- VMware integration is fully GitOps-native. Nodes are provisioned, updated, and decommissioned through the same GitOps workflow as everything else on the platform.
The learning curve was real: getting kernel modules, NetworkManager, and enterprise authentication (SSSD/AD) right inside an immutable image took iteration. But once solved, the result is a node foundation we can genuinely trust, which matters when regulators ask questions.
k0rdent: Cluster lifecycle management as a platform
Immutable nodes solved the “what is running” problem. But we still needed to answer “how do clusters get created, updated, and decommissioned?” consistently across our entire fleet.
k0rdent, built on Cluster API (CAPI), gave us a Kubernetes-native control plane for managing Kubernetes clusters. Rather than treating cluster provisioning as a bespoke scripting exercise, k0rdent models clusters as CRDs. Combined with k0smotron for in-cluster control planes, we can now express our entire cluster topology declaratively, and FluxCD reconciles that state continuously.
Our choice of Kubernetes distribution for workload clusters was k0s, a CNCF Sandbox project. k0s is a fully self-contained, single-binary Kubernetes distribution with no host OS dependencies beyond the kernel. That property matters a great deal when your nodes are running an immutable OS: k0s installs cleanly into a Kairos image without requiring package managers, systemd unit file manipulation at runtime, or any of the host-level assumptions that distributions like kubeadm make. The combination of Kairos and k0s gives us a full node-to-cluster stack where every component is declaratively expressed, OCI-packaged, and reproducible from a clean boot.
k0smotron extends this further by allowing Kubernetes control planes to run as workloads inside the management cluster, meaning even the control plane is expressed as a CRD, reconciled by FluxCD, with no out-of-band state.
The architecture we settled on organizes clusters into a hub-and-spoke model:
- A management cluster runs k0rdent, k0smotron, and the CAPI controllers.
- Workload clusters run k0s, provisioned and decommissioned through CRD manifests stored in Git.
- MetalLB handles load-balancing on bare-metal segments; Traefik provides ingress with consistent configuration across all spoke clusters.
Beyond day-one provisioning, this approach transformed how we handle day-two operations:
- Cluster upgrades are a pull request. The desired Kubernetes version is updated in a manifest, reviewed, and FluxCD applies it. There is no “who ran what command on which cluster” ambiguity.
- Cluster templates let us standardize configurations for common use cases, trading desk clusters, risk compute clusters, tooling clusters, and spin up new instances in minutes rather than days.
- Compliance posture is consistent by default. Because every cluster is expressed as code, our CEL-based admission webhooks and RBAC policies are applied uniformly at cluster creation time rather than bolted on after the fact.
We are also using k0rdent as the foundation for a spot-computing scheduler that allows donated physical server capacity to be absorbed dynamically into our platform, a capability we plan to share more about in a future post.
bindy: Kubernetes-native DNS operations
The last gap, and the one where no existing project fully covered our requirements, was DNS. In capital markets, DNS is not a commodity concern. Our trading applications, market data feeds, and risk systems use DNS extensively, and the enterprise infrastructure that serves them has been built and maintained over decades.
At RBC Capital Markets, that infrastructure is Infoblox, an enterprise DDI platform that is deeply integrated into our network operations. The integration model, however, was built for a world before Kubernetes: every DNS record request went through a ticketing workflow, routed to the network team, and processed on a timescale measured in hours or days. As our platform scaled to 50+ clusters, each spinning up dozens of services and ingress endpoints, that provisioning lag became a genuine operational bottleneck, and the paper trail for DNS changes lived entirely outside our GitOps audit trail.
bindy was built by Erick Bourgeois to bridge this gap, a Kubernetes operator, written in Rust using kube-rs, that manages DNS zones and records as first-class Kubernetes resources. The core design philosophy was to make DNS a GitOps citizen, with the same reconciliation guarantees we apply to everything else on the platform:
- Zones and records are CRDs. A DNSZone or ARecord manifest in Git is the source of truth, reconciled continuously by bindy’s controllers.
- RFC 2136 dynamic updates allow bindy to push record changes to the DNS backend without manual intervention or ticket queues.
- bindcar, a sidecar REST API, provides an RNDC interface that bindy’s controllers use for zone lifecycle operations (zone creation, deletion, reload) alongside dynamic updates.
- Multi-controller architecture with strict write boundaries prevents split-brain scenarios. Selection controllers and sync controllers are separated; sync state is stored on the synced resource to support force-reconciliation patterns.
The impact has been immediate. DNS records for new services are created automatically as part of the same GitOps workflow that deploys the service itself, provisioning time drops from hours to seconds, and the audit trail is Git history, not a ticket system. The rigid integration boundary that previously required human coordination on every DNS change is replaced by a reconciliation loop.
bindy is currently being expanded to support compliance scoring (a CRD-based model for zone health) and a future MCP server interface for integration with AI-driven platform tooling.
How the three fit together
What makes this stack coherent is that each layer builds on the same foundational principle: everything is code, reconciled continuously, with no manual state.
Git (source of truth)
└── FluxCD (reconciliation engine)
├── k0rdent / CAPI manifests → cluster lifecycle
├── Kairos cloud-config → node configuration
└── bindy CRDs → DNS records
Kairos ensures every node boots from a known, auditable image. k0rdent ensures every cluster is expressed and managed declaratively. bindy ensures every DNS record is a versioned artefact. FluxCD ties them together as the single reconciliation plane. The result is a platform where drift, at the node, cluster, or network level, is structurally prevented rather than operationally managed.
Challenges and lessons learned
Building this platform taught us several things we wish we had known earlier:
- Immutable OS adoption requires patience with enterprise integration. SSSD, NetworkManager, and corporate CA trust chains all need explicit attention when baking immutable images. Document everything; the day-two operator who debugs a boot failure at 2 AM is often not the person who built the image.
- CRD-based cluster management shifts responsibility left. When cluster provisioning is a pull request, platform teams need to invest in review processes and template governance up front, or the simplicity of “just a YAML file” becomes its own source of drift.
- Building operators in Rust is the right long-term call, but the ecosystem is still maturing. kube-rs is excellent, but patterns for multi-controller architectures with reflector/store caching require deliberate design decisions that the community is still converging on.
Looking ahead
Our platform continues to evolve. Some of the areas we are actively developing:
- SPIRE/SPIFFE integration for workload identity across all 50+ clusters, replacing certificate-per-service approaches with a hub-and-spoke SPIRE architecture that satisfies our zero-trust requirements.
- Foundry, an internal self-service API layer, built in Rust, that will surface cluster and DNS provisioning capabilities to development teams through a governed, event-driven interface.
- Kairos-based spot computing using k0smotron and Kata Containers to absorb donated physical server capacity dynamically.
We are proud to be building on and contributing back to the CNCF ecosystem, and we look forward to continuing to share what we learn. If you are working through similar challenges in a regulated environment, we would love to connect, find us in the Kairos, k0rdent, and FluxCD Slack communities, or reach out directly on LinkedIn.
Erick Bourgeois is Director and Head of Kubernetes Platform Engineering at RBC Capital Markets, managing 50+ Kubernetes clusters across multi-cloud and on-premises environments. He is a KubeCon and FluxCon speaker, FINOS Common Cloud Control member, and open-source developer at github.com/firestoned.