Kubernetes at its core is a resources management and orchestration tool. It is ok to focus day-1 operations to explore and play around with its cool features to deploy, monitor and control your pods. However, you need to think of day-2 operations as well. You need to focus on questions like:
I’m providing in this post a high-level overview of different scalability mechanisms inside Kubernetes and best ways to make them serve your needs. Remember, to truly master Kubernetes, you need to master different ways to manage the scale of cluster resources, that’s the core of promise of Kubernetes.
Configuring Kubernetes clusters to balance resources and performance can be challenging, and requires expert knowledge of the inner workings of Kubernetes. Just because your app or services’ workload isn’t constant, it rather fluctuates throughout the day if not the hour. Think of it as a journey and ongoing process.
Effective kubernetes auto-scaling requires coordination between two layers of scalability: (1) Pods layer autoscalers, this includes Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA); both scale available resources for your containers, and (2) cluster level scalability, which managed by the Cluster Autoscaler (CA); it scales up or down the number of nodes inside your cluster.
As the name implies, HPA scales the number of pod replicas. Most DevOps use CPU and memory as the triggers to scale more pod replicas or less. However, you can configure it to scale your pods based on custom metrics, multiple metrics, or even external metrics.
Consider these as you rollout HPA:
Vertical Pods Autoscaler (VPA) allocates more (or less) cpu or memory to existing pods. Think of it as giving pods some growth hormones :) It can work for both stateful and stateless pods but it is built mainly for stateful services. However, you can use it for stateless pods as well if you would like to implement an auto-correction of resources you initially allocated for your pods. VPA can also reacts to OOM (out of memory) events. VPA requires currently for the pods to be restarted to change allocated cpu and memory. When VPA restarts pods it respects pods distribution budget (PDB) to make sure there is always the minimum required number of of pods. You can set the min and max of resources that the VPA can allocate to any of your pods. For example, you can limit the maximum memory limit to be no more than 8 GB. This is useful in particular when you know that your current nodes cannot allocate more than 8 GB per container. Read the VPA’s official wiki page for detailed spec and design.
VPA has also an interesting feature called the VPA Recommender. It watches the historic resources usage and OOM events of all pods to suggest new values of the “request” resources spec. The Recommender generally uses some smart algorithm to calculate memory and cpu values based on historic metrics. It also provides an API that takes the pod descriptor and provides suggested resources requests.
It worth mentioning that VPA Recommender doesn’t work on setting up the “limit” of resources. This can cause pods to monopolize resources inside your nodes. I suggest you set a “limit” value at the namespace level to avoid crazy consumption of memory or CPU
A few points to consider as you rollout the VPA:
Cluster Autoscaler (CA) scales your cluster nodes based on pending pods. It periodically checks whether there are any pending pods and increases the size of the cluster if more resources are needed and if the scaled up cluster is still within the user-provided constraints. CA interfaces with the cloud provider to request more nodes or deallocate idle nodes. It works with GCP, AWS and Azure. Version 1.0 (GA) was released with Kubernetes 1.8.
Consider these as you roll-out the CA
If you would like to reach nirvana autoscaling your Kubernetes cluster, you will need to use pod layer autoscalers with the CA. The way they work with each other is relatively simple as show in below illustration.
I’ve seen in different forums, such as Kubernetes slack channels and StackOverflow questions, common issues due to some facts that many DevOps miss while getting their feet wet with autoscalers.
HPA and VPA depend on metrics and some historic data. If you don’t have enough resources allocated, your pods will be OOM killed and never get a chance to generate metrics. Your scale may never take place in this case.
Scaling up is the mostly a time sensitive operation. You want your pods and cluster to scale fairly quickly before your users experience any disruption or crashes in your application. You should consider the average time it can take your pods and cluster to scale up.
Do not confuse cloud provider scalability mechanisms with the CA. CA works from within your cluster while cloud provider’s scalability mechanism (such as ASGs inside AWS) work based on nodes allocation. It is not aware of what’s taking place with your pods or application. Using them together will render your cluster unstable and hard to predict behavior.
Written by Mohamed Ahmed, Co-Founder of Magalix. Like what you read? Give Mohamed Ahmed a round of applause on Medium.
Magalix eliminates the complexity of balancing performance with infrastructure capacity, using AI. It is a low-touch service that makes infrastructure self-healing to deliver the maximum value.