Whether you are just getting started with Kubernetes or it is fully adopted in your organization, getting the most out of it and providing the best user experience will always be a challenge. Targeting the best performance or highest efficiency of your kubernetes cluster too early will slow you down significantly, and you may end up with lots of security holes in your cluster or just your team giving up on the technology. Deferring these optimizations will make you run in reactive mode, which creates unnecessary stress for you and your team. It is crucial to have a proper framework to prioritize factors impacting your infrastructure and your users’ experience at the right time.
Teams with successful Kubernetes adoption stories were conscious about the right priorities at each stage of their journey. Whether you are a developer, a DevOps or an engineering manager you should plan each step and decide what’s in scope and what’s out of scope for your team till you are production ready.
In a nutshell, teams that are jumping on the Kubernetes train go through the three main stages outlined below. I’ll talk more about them in detail in a later article, stay tuned!
You cannot prioritize performance, utilization, and cost efficiency all the once. You will need to stage these throughout your journey. Your goal is to have the proper balance of these three factors before reaching day-2 operations — read this capacity management article to understand how each team member can contribute. But let’s focus here on how you can manage these critical factors during the Kubernetes adoption journey.
Your goals in day-0 are:
Prometheus is your friend at this stage :) You can install Prometheus operator to get the necessary metrics out of your Kubernetes cluster. However, I highly recommend using Kube-Prometheus, which installs for you:
You should consider at this stage organizing your dashboards by downloading and customizing pre-canned Grafana dashboards. They will give you a great starting point to monitor different aspects of your Kubernetes cluster. I also highly recommend installing various Prometheus exporters to expose custom metrics of pre-built containers, such as Redis, Postgres DB, MongoDB, etc.
Monitor Your cluster’s overall resources allocation:
Monitor your microservices KPIs and their impact on resources usage.
Your goals in day-1 should focus on making sure that your application and cluster is ready to scale:
You need to form a new friendship with manage the scalability of Kubernetes Pods and cluster nodes. Below are a couple of options
Application Specific KPIs, which can be any of these:
Your goals in day-2 are to scale in a way to maximize the value of your infrastructure without frustrating your users and other stakeholders in the organization:
It is hard to find a single tool that can satisfy day-2 goals. But at this stage, it becomes more of a game of high-level monitoring and decision making. You can import some billing metrics in Prometheus and chart your cost over time. But no open-source tool out there can analyze your billing options and possible optimizations. You can depend on some commercial tools such as Magalix Node Advisor, or CloudHealth reports to give you some insights about your billing optimizations.
You still need to keep an eye on day-0 and day-1 metrics. But now you want to build higher level KPIs that track your application’s performance (or user experience), compute resources utilization, and the cost per relevant business transaction. Below are some examples:
At Magalix we can help you in your kubernetes adoption journey. You can see in one dashboard the performance of your containers, kubernetes cluster utilization, and detailed cost analysis. Connect your Kubernetes cluster for free today and get an in-depth analysis of your Kubernetes cluster. You can also run your cluster on Autopilot mode to keep adjusting to your capacity proactively based on anticipated workloads.
Magalix eliminates the complexity of balancing performance with infrastructure capacity, using AI. It is a low-touch service that makes infrastructure self-healing to deliver the maximum value.