Improving Capacity Management in Kubernetes Clusters: Q&A With Mohamed

DevOps, Kubernetes, Capacity Management, InfoQ
Improving Capacity Management in Kubernetes Clusters: Q&A With Mohamed
DevOps, Kubernetes, Capacity Management, InfoQ

This article was originally published at

https://www.infoq.com/news/2020/01/Kubernetes-Capacity-Management/

InfoQ recently spoke with Mohamed Ahmed, the co-founder and CEO of Magalix, a Kubernetes optimization company, to discuss the critical discipline of capacity management across cloud-native infrastructure and applications.

Mohamed Ahmed

Capacity management is a critical discipline for companies that want to run reliable and efficient cloud-native infrastructure. Excellent cloud-native application design should declare any specific resources that it needs to operate correctly, however in order to achieve maximum performance and availability, engineers must find the balance between user workloads, the application architecture, and the underlying cloud infrastructure. It can be hard to have a common picture of effective Kubernetes and effective application capacity management and requires full team engagement to balance performance against resources and cost.

InfoQ: What Typical Problems Do You Encounter in Kubernetes Capacity Management?

Mohamed Ahmed: Capacity management is like securing your infrastructure and applications. The first problem is waiting until it becomes an issue and then acting in a reactive way. Poor capacity management causes bad application performance, live site incidents (LSIs), and a soaring monthly cloud bill, all of which put teams into fire fighting mode. Leaders and engineers should set the right KPIs and priorities to tackle each dimension of capacity management proactively to get out of this vicious cycle.
Teams may also lack a common view about capacity management. For example, developers may look at microservices and ignore or not fully understand the limits of their infrastructure. It’s also easy to focus on one set of metrics without considering the impact on the rest of the system. On the other hand, we also see engineers causing application downtime by changing resource allocations without considering the impact of these changes on the application’s performance and usage patterns.

InfoQ: What Are Some Indicators Of Poor Capacity Management?

Mohamed Ahmed: So, how do you know if you have room to improve how your team manages the capacity of your Kubernetes clusters? I broke it down in the below table to the three areas that any organization should keep an eye on. To accurately assess your team’s effectiveness, answer these questions:
  • How frequently does your team get these triggers?
  • How much of your team’s time is spent reacting to these triggers?
  • Do you have a few team members always acting on these triggers, or is it distributed across the whole team?

InfoQ: How Can Engineers Optimize Their Kubernetes Clusters Upfront?

Mohamed Ahmed: If workloads are not highly variant or teams are fine having a relatively large CPU/memory buffer, they can create a simple resources allocation sheet as I explained in this article, but it’s generally a continuous effort to optimise Kubernetes resources. Workloads change all the time, and frequent code updates change CPU/memory requirements. Also, updating cluster configurations and the scheduling of containers changes the overall utilization of infrastructure, so it is not a fire-and-forget effort. The moment SREs [site reliability engineers] or developers optimize cluster resources, their clusters will start slipping into either under or overutilization state. In the case of under-utilization, teams end up spending too much on their cloud infrastructure for the value they get. In an overutilization situation, applications are not reliable or performant anymore. Think of the optimization process as a budgeting exercise. You have resources that you would like to allocate. But if they want to keep their cluster optimized more frequently, Magalix has a free offering to automate capacity management for your Kubernetes cluster.

InfoQ: What Are The Potential Pitfalls Of Specifying Kubernetes Resource Requests And Limits?

Mohamed Ahmed: The major pitfall is dealing with requests and limits as any other configuration item in pods spec files. They should be updated frequently based on the factors I mentioned earlier. This can be as frequent as every few hours or every one to two weeks. Setting the request value guarantees minimum resources for your containers. But if the value is set too high, your pods are basically over-provisioning resources from your virtual machine (VM). If the value is set too low, you are under jeopardy of not having enough resources available for your container to run reliably.
Setting limits guarantees that pods won’t monopolize VM resources and starve others. Not setting the limit value puts your infrastructure and application under the risk of evicting or killing pods due to resource starvation. Setting the limits to a low value can cause stability or significant performance issues. If your container reaches the memory limits, the operating system (OS) will kill it automatically. This may cause your container to go into a crash loop. If the container CPU limits are low, the OS will throttle your container and your application will suffer a significant slow-down, even if the VM has additional CPU cycles that this container could take.

Share Your Experience & Earn Money!

We launched the Write for Cloud-Native program to share your experiences and knowledge with the fast-growing cloud-native community. We get thousands of readers each day. We also offer a payout for each published article. 

Share Your Experience With The Community Now

Comments and Responses

Related Articles

DevOps, Kubernetes, cost saving, K8s
Kubernetes Cost Optimization 101

Over the past two years at Magalix, we have focused on building our system, introducing new features, and

Read more
The Importance of Using Labels in Your Kubernetes Specs: A Guide

Even a small Kubernetes cluster may have hundreds of Containers, Pods, Services and many other Kubernetes API

Read more
How to Deploy a React App to a Kubernetes Cluster

Kubernetes is a gold standard in the industry for deploying containerized applications in the cloud. Many

Read more

start your 14-day free trial today!

Automate your Kubernetes cluster optimization in minutes.

Get started View Pricing
No Card Required