Weaveworks 2022.03 release featuring Magalix PaC | Learn more
Balance innovation and agility with security and compliance
risks using a 3-step process across all cloud infrastructure.
Step up business agility without compromising
security or compliance
Everything you need to become a Kubernetes expert.
Always for free!
Everything you need to know about Magalix
culture and much more
Kubernetes capacity management is a core competency of teams shipping cloud-native applications. Proper capacity management enables great customer experience, teams to innovate faster and maximizes the ROI in your cloud infrastructure. Capacity Management, however, is a challenge for many teams for three main reasons:
Developers, DevOps, and engineering managers are the three main roles directly impacting the effectiveness of capacity management. Having them to agree on effective capacity management is a challenge. Each role has its own motivations to get their job done. Team members may conflict in their requirements. For example, developers are motivated to ship features quickly. They have no time to analyze and study needed resources or to improve the efficiency of their code. Let’s dig deeper into the motivations of each role.
Developers ship features and fixes bugs. Throughout my experience seeing others adopting Kubernetes, we’ve seen developers motivated by these factors:
DevOps or infrastructure engineers are at the core of making sure that products are delivering their SLA. They are in the middle of an ongoing storm of evolving infrastructure, application architecture, and business requirements. DevOps are usually motivated by these requirements:
Engineering managers enable teams to innovate fast to meet business goals as efficiently as possible. They are usually motivated by these requirements:
Business Owners and Product Managers in many cases impact also capacity management and planning in case of significant business events. A marketing campaign, for example, may drive unusual traffic. The corresponding business owner should warn developers and DevOps of the expected traffic. The challenge here is when there is a rough estimate of the number of users or traffic. It becomes hard to map this to specific system requirements. Many teams end up over-provisioning to be on the safe side.
Getting into the vicious cycle of poor capacity Management. Teams get quickly into the vicious cycle of poor capacity management when they become reactive most of the time. Reacting to bad performance, Live Site Incidents (LSIs), or the monthly cloud bill puts your team constantly in fire fighting mode. You have to cut this cycle at a certain point. Make sure your team has the right KPIs and priorities to proactively tackle each dimension of capacity management.
Lack of a common view about capacity management. Each member look at their point of view of the world. For example, developers look at microservices and ignore or don’t understand well the limits of their infrastructure. Also, focusing on one set of metrics without considering the impact on the rest of the system is a dangerous practice. We have seen DevOps taking applications down when they want to improve CPU utilization. They usually overlook the impact of these changes on the application’s performance and usage patterns.
So, how do you know if you have room to improve how your team manages the capacity of your Kubernetes clusters? I broke it down in the below table to the three areas that any team should keep an eye on. To properly assess your team’s effectiveness, answer these questions:
We learned that capacity management inside Kubernetes is a collaborative effort. Kubernetes provides a good abstraction of the infrastructure. Your team, however, still have a lot of interaction points. The team still needs to collaborate on capacity allocation, application performance tuning, and of course, saving on the cost of cloud infrastructure. You can more read about this topic here.
If you are a Developer, you need to:
If you are a DevOps engineer, you need to:
If you are an engineering manager, you need to:
Self-service developer platform is all about creating a frictionless development process, boosting developer velocity, and increasing developer autonomy. Learn more about self-service platforms and why it’s important.
Explore how you can get started with GitOps using Weave GitOps products: Weave GitOps Core and Weave GitOps Enterprise. Read more.
More and more businesses are adopting GitOps. Learn about the 5 reasons why GitOps is important for businesses.
Implement the proper governance and operational excellence in your Kubernetes clusters.
Comments and Responses