Balance innovation and agility with security and compliance
risks using a 3-step process across all cloud infrastructure.
Step up business agility without compromising
security or compliance
Everything you need to become a Kubernetes expert.
Always for free!
Everything you need to know about Magalix
culture and much more
Kubernetes at its core is a resources management and orchestration tool. It is ok to focus day-1 operations to explore and play around with its cool features to deploy, monitor and control your pods. However, you need to think of day-2 operations as well. You need to focus on questions like:
I’m providing in this post a high-level overview of different scalability mechanisms inside Kubernetes and the best ways to make them serve your needs. Remember, to truly master Kubernetes, you need to master different ways to manage the scale of cluster resources, that’s the core of the promise of Kubernetes.
Configuring Kubernetes clusters to balance resources and performance can be challenging, and requires expert knowledge of the inner workings of Kubernetes. Just because your app or services’ workload isn’t constant, it rather fluctuates throughout the day if not the hour. Think of it as a journey and an ongoing process.
Effective Kubernetes auto-scaling requires coordination between two layers of scalability: (1) Pods layer auto scalers, this includes Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA); both scale available resources for your containers, and (2) cluster level scalability, which managed by the Cluster Autoscaler (CA); it scales up or down the number of nodes inside your cluster.
As the name implies, HPA scales the number of pod replicas. Most DevOps use CPU and memory as the triggers to scale more pod replicas or less. However, you can configure it to scale your pods based on custom metrics, multiple metrics, or even external metrics.
High-level HPA workflow
HPA continuously checks metrics values you configure during setup AT A DEFAULT 30 SEC intervals
HPA attempts to increase the number of pods If the SPECIFIED threshold is met
HPA mainly updates the number of replicas inside the deployment or replication controller
Consider these as you rollout HPA:
Vertical Pods Autoscaler (VPA) allocates more (or less) CPU or memory to existing pods. Think of it as giving pods some growth hormones :) It can work for both stateful and stateless pods but it is built mainly for stateful services. However, you can use it for stateless pods as well if you would like to implement an auto-correction of resources you initially allocated for your pods. VPA can also react to OOM (out of memory) events. VPA requires currently for the pods to be restarted to change allocated CPU and memory. When VPA restarts pods it respects the pod's distribution budget (PDB) to make sure there is always the minimum required number of pods. You can set the min and max of resources that the VPA can allocate to any of your pods. For example, you can limit the maximum memory limit to be no more than 8 GB. This is useful in particular when you know that your current nodes cannot allocate more than 8 GB per container. Read the VPA’s official wiki page for detailed spec and design.
VPA has also an interesting feature called the VPA Recommender. It watches the historic resources usage and OOM events of all pods to suggest new values of the “request” resources spec. The Recommender generally uses some smart algorithms to calculate memory and CPU values based on historic metrics. It also provides an API that takes the pod descriptor and provides suggested resources requests.
It is worth mentioning that VPA Recommender doesn’t work on setting up the “limit” of resources. This can cause pods to monopolize resources inside your nodes. I suggest you set a “limit” value at the namespace level to avoid crazy consumption of memory or CPU
High-level VPA workflow
VPA continuously checks metrics values you configured during setup AT A DEFAULT 10 SEC intervals
VPA attempts to change the allocated memory and/or CPU If the threshold is met
VPA mainly updates the resources inside the deployment or replication controller specs
When pods are restarted the new resources all applied to the created instances.
A few points to consider as you rollout the VPA:
Cluster Autoscaler (CA) scales your cluster nodes based on pending pods. It periodically checks whether there are any pending pods and increases the size of the cluster if more resources are needed and if the scaled-up cluster is still within the user-provided constraints. CA interfaces with the cloud provider to request more nodes or deallocate idle nodes. It works with GCP, AWS, and Azure. Version 1.0 (GA) was released with Kubernetes 1.8.
High-level CA workflow
The CA checks for pods in a pending state at a default interval of 10 seconds.
When If there are one or more pods in a pending state because of there are not enough available resources on the cluster to allocate on the cluster them, then it attempts to provision one or more additional nodes.
When the node is granted by the cloud provider, the node is joined to the cluster and becomes ready to serve pods.
Kubernetes scheduler allocates the pending pods to the new node. If some pods are still in the pending state, the process is repeated and more nodes are added to the cluster.
Consider these as you roll-out the CA
If you would like to reach nirvana autoscaling your Kubernetes cluster, you will need to use pod layer autoscalers with the CA. The way they work with each other is relatively simple as shown in the below illustration.
HPA or VPA update pod replicas or resources allocated to an existing pod.
If not enough nodes to run pods post scalability event, CA picks up the fact that some or all of the scaled pods are in pending state.
CA allocates new nodes
Pods are scheduled on the provisioned nodes.
I’ve seen in different forums, such as Kubernetes slack channels and StackOverflow questions, common issues due to some facts that many DevOps miss while getting their feet wet with autoscalers.
HPA and VPA depend on metrics and some historic data. If you don’t have enough resources allocated, your pods will be OOM killed and never get a chance to generate metrics. Your scale may never take place in this case.
Scaling up is the most a time-sensitive operation. You want your pods and cluster to scale fairly quickly before your users experience any disruption or crash in your application. You should consider the average time it can take your pods and cluster to scale up.
30 seconds — Target metrics values updated: 30–60 seconds
30 seconds — HPA checks on metrics values: 30 seconds ->
< 2 seconds — pods created and go into pending state — 1 second
< 2 seconds — CA sees the pending pods and fires up the calls to provision nodes — 1 second
3 minutes — Cloud provider provisions the nodes & K8 waits for them till they are ready: up to 10 minutes (depends on multiple factors)
60 seconds — Target metrics values updated
30 seconds — HPA checks on metrics values
< 2 seconds — pods created and go into pending state
< 2 seconds — CA sees the pending pods and fires up the calls to provision nodes
10 minutes — Cloud provider provisions the nodes & K8 waits for them till they are ready minutes (depends on multiple factors, such provider latency, OS latency, bootstrapping tools, etc. )
Do not confuse cloud provider scalability mechanisms with the CA. CA works from within your cluster while cloud providers’ scalability mechanisms (such as ASGs inside AWS) work based on nodes allocation. It is not aware of what’s taking place with your pods or application. Using them together will render your cluster unstable and hard to predict behavior.
Self-service developer platform is all about creating a frictionless development process, boosting developer velocity, and increasing developer autonomy. Learn more about self-service platforms and why it’s important.
More and more businesses are adopting GitOps. Learn about the 5 reasons why GitOps is important for businesses.