In this blog post, we'll discuss making Kubernetes serverless and global with AWS Fargate on EKS and Admiralty, using multiple Fargate-enabled Kubernetes clusters in multiple regions. In particular, we'll look at scheduling and ingress in this context; alternatives will be considered. For a hands-on experience, check out the companion tutorial in the Admiralty documentation.
HA and Performance vs. Developer Efficiency
High availability (HA), performance, and developer efficiency have become table stakes for new developments and modernizations of legacy software applications. Users—wherever they may be—do not tolerate outages and expect low latency and high throughput; developers need to bring their applications to market fast and update them often. Those two trends work against each other: setting up an application for HA and performance and ensuring it stays so takes time. Luckily, Kubernetes and the cloud native ecosystem give developers the building blocks to deploy highly available container-based applications. (Note: in this blog post, "developers" includes operators.)
Making Kubernetes Serverless
To further improve developer efficiency, tools like AWS Fargate on EKS remove the node management part out of Kubernetes, leaving only the application management API: users can submit Kubernetes standard Deployments, Services, and Ingresses, and let AWS spin up right-sized micro-VMs to run the pods, and Application Load Balancers (ALBs) to serve traffic—to complete the networking stack, you'd let external-dns configure Route53, and maybe one day ACM will auto-provision certificates (for now, Ingress annotations must refer to existing ACM certificates).
Making Kubernetes Global
The tools discussed above assume that the interface with the developer—or more likely their continuous deployment (CD) platform of choice—is the Kubernetes API of a single cluster. However, most organizations run multiple clusters, mainly to make runtime isolation less of a headache, but also for HA, because clusters do fail. Furthermore, going back to low latency "wherever users may be", organizations often run clusters in multiple regions to be closer to their users. This creates a new set of problems. In this blog post, we'll focus on deploying applications to multiple clusters and routing ingress traffic to multiple clusters, possibly in multiple regions. If you want to enable cross-cluster traffic, you'll also need a multi-cluster service mesh (or ad-hoc mTLS and service discovery)—but we'll keep this topic for a future blog post. We also assume that storage is externalized to a global cloud database.
"Deploying applications to multiple clusters" means either mapping applications to clusters (for isolation), and/or replicating applications across clusters (for HA and performance). The simplest solution is to implement custom mapping and replication logic as a CD concern, and create standard Kubernetes Deployments in the selected cluster(s). To treat clusters "as cattle", clusters can be registered and labeled, then selected based on those labels. Google Anthos, Rancher Fleet, kubefed, among others, implement this approach. While simple internally, it actually adds complexity externally (developers must deal with a new cluster selection API) and lacks expressive power (e.g., how do you manage scale across clusters?).
Admiralty implements a different approach, affording some internal complexity (cf. multi-cluster pod scheduling algorithm) to keep the user experience simple. With Admiralty, a single standard Kubernetes Deployment created in a management cluster can span multiple workload clusters, if its node selector and other standard scheduling constraints match nodes—or Fargate profiles—in those clusters (no new API needed). Regarding expressive power, scale is treated as a multi-cluster concept. A horizontal pod autoscaler based on regional throughput can control the scale of a regional Deployment; if a cluster with N of the M pods of that Deployment goes offline, the N pods are immediately migrated to other clusters to keep the current total number of replicas at M. With other tools, independent horizontal pod autoscalers would put unnecessary stress on the application while they progressively react to increased cluster load. The design also lends itself well to batch workloads, e.g., "run this job wherever X GPUs are currently available."
Solutions to "routing ingress traffic to multiple clusters" can be split into three categories: global DNS routing policies, regional reverse proxies, and global anycast load balancers.
The simplest/cheapest way to implement global load balancing is to configure global DNS records (e.g., with Route53) using low TTLs and smart routing policies. In this case, each cluster has its own ingress controller (ALB for AWS Fargate on EKS, but other clusters could use different, even multiple ingress controllers, including NGINX, and/or Services of type LoadBalancer). The ingress controllers create cluster-specific endpoints. You can then configure DNS records with a combination of latency-based and weighted routing policies, for global HA and performance. Admiralty Cloud automates this process.
If you'd rather not rely on DNS for load balancing, you should use an external reverse proxy or load balancer instead. Open-source Project Contour's Gimbal and uSwitch's Yggdrasil are Envoy-based ingress controllers shared by multiple clusters. With Gimbal, Envoy runs in its own cluster, while the creators of Yggdrasil prefer running Envoy out of Kubernetes. Unfortunately, neither of them supports global load balancing.
To get the best of both worlds, you'll want to integrate with an anycast load balancer like AWS Global Accelerator, Azure Front Door, Cloudflare, Fastly, or the open source SeeSaw, among others. This is what Google's Ingress for Anthos does between Anthos GKE clusters and Google Cloud Load Balancing, and what Admiralty Enterprise does between Kubernetes clusters on AWS and AWS Global Accelerator, among other integrations.
The companion tutorial walks you through exposing a global service using AWS Fargate on EKS, ALB ingress controllers, the Admiralty open source multi-cluster scheduler, and Admiralty Cloud (to connect clusters and provide DNS load balancing), with copy-paste instructions. Here is a summary:
- Provision a management cluster, and four workload clusters across two regions. The management cluster is a common pattern: Google Anthos calls it the Config Cluster; Rancher Fleet calls it the Fleet Manager, etc.
- Install the Admiralty agent in all clusters and connect their control planes using Admiralty Cloud as a public key directory federating cluster identity providers. In the workload clusters, install the ALB ingress controller and a Fargate profile.
- Create two regional Deployments, a Service, and an Ingress in the management cluster using less than 100 lines of YAML. In turn, Admiralty schedules pods in the workload clusters, sends Services and Ingresses to follow, and configures DNS records; while AWS runs containers on Fargate and configures ALBs in the clusters' respective regions.
- Call the global service endpoint from different regions to test performance, and simulate blue/green cluster upgrades and cluster failures to test HA.
HA and performance and developer efficiency can be reconciled by adopting Kubernetes distributions that go as far as to abstract nodes, like AWS Fargate on EKS, and integrating them with multi-cluster application management platforms, like Admiralty, that automate deployments, ingress and other concerns at global scale.
Originally published at The New Stack.