Skip to content

Latest commit

 

History

History
48 lines (29 loc) · 4.94 KB

hazl.md

File metadata and controls

48 lines (29 loc) · 4.94 KB

High Availability Zonal Load Balancing (HAZL)

High Availability Zonal Load Balancing (HAZL) is a dynamic request-level load balancer in Buoyant Enterprise for Linkerd that balances HTTP and gRPC traffic in environments with multiple availability zones. For Kubernetes clusters deployed across multiple zones, HAZL can dramatically reduce cloud spend by minimizing cross-zone traffic.

Unlike other zone-aware options that use Topology Hints (including Istio and open source Linkerd), HAZL never sacrifices reliability to achieve this cost reduction.

In multi-zone environments, HAZL can:

  • Cut cloud spend by eliminating cross-zone traffic both within and across cluster boundaries;
  • Improve system reliability by distributing traffic to additional zones as the system comes under stress;
  • Prevent failures before they happen by quickly reacting to increases in latency before the system begins to fail.
  • Preserve zone affinity for cross-cluster calls, allowing for cost reduction in multi-cluster environments.

Like Linkerd itself, HAZL is designed to "just work". It works without operator involvement, can be applied to any Kubernetes service that speaks HTTP / gRPC regardless of the number of endpoints or distribution of workloads and traffic load across zones, and in the majority of cases requires no tuning or configuration.

How High Availability Zonal Load Balancing (HAZL) Works

For every endpoint, HAZL maintains a set of data that includes:

  • The zone of the endpoint
  • The cost associated with that zone
  • The recent latency of responses to that endpoint
  • The recent failure rate of responses to that endpoint

For every service, HAZL continually computes a load metric measuring the utilization of the service. When load to a service falls outside the acceptable range, whether through failures, latency, spikes in traffic, or any other reason, HAZL dynamically adds additional endpoints from other zones. When load returns to normal, HAZL automatically shrinks the load balancing pool to just in-zone endpoints.

In short: under normal conditions, HAZL keeps all traffic within the zone, but when the system is under stress, HAZL will temporarily allow cross-zone traffic until the system returns to normal. We'll see this in the HAZL demonstration.

HAZL will also apply these same principles to cross-cluster / multi-cluster calls: it will preserve zone locality by default, but allow cross-zone traffic if necessary to preserve reliability.

High Availability Zonal Load Balancing (HAZL) vs Topology Hints

HAZL was designed in response to limitations seen by customers using Kubernetes's native Topology Hints (aka Topology-aware Routing) mechanism. These limitations are shared by native Kubernetes balancing (kubeproxy) as well as systems such as open source Linkerd and Istio that make use of Topology Hints to make routing decisions.

Within these systems, the endpoints for each service are allocated ahead of time to specific zones by the Topology Hints mechanism. This distribution is done at the Kubernetes API level, and attempts to allocate endpoints within the same zone (but note this behavior isn't guaranteed, and the Topology Hints mechanism may allocate endpoints from other zones). Once this allocation is done, it is static until endpoints are added or removed. It does not take into account traffic volumes, latency, or service health (except indirectly, if failing endpoints get removed via health checks).

Systems that make use of Topology Hints, including Linkerd and Istio, use this allocation to decide where to send traffic. This accomplishes the goal of keeping traffic within a zone but at the expense of reliability: Topology Hints itself provides no mechanism for sending traffic across zones if reliability demands it. The closest approximation in (some of) these systems are manual failover controls that allow the operator to failover traffic to a new zone.

Finally, Topology Hints has a set of well-known constraints, including:

  • It does not work well for services where a large proportion of traffic originates from a subset of zones.
  • It does not take into account tolerations, unready nodes, or nodes that are marked as control plane or master nodes.
  • It does not work well with autoscaling. The autoscaler may not respond to increases in traffic, or respond by adding endpoints in other zones.
  • No affordance is made for cross-cluster traffic.

These constraints have real-world implications. As one customer put it when trying Istio + Topology Hints: "What we are seeing in some applications is that they won’t scale fast enough or at all (because maybe two or three pods out of 10 are getting the majority of the traffic and is not triggering the HPA) and can cause a cyclic loop of pods crashing and the service going down."

Return to main document here.