Balance innovation and agility with security and compliance
risks using a 3-step process across all cloud infrastructure.
Step up business agility without compromising
security or compliance
Everything you need to become a Kubernetes expert.
Always for free!
Everything you need to know about Magalix
culture and much more
Understanding Kubernetes monitoring pipeline(s) is essential to help you diagnose run-time problems and to manage the scale of your pods, and cluster. Monitoring is one of these areas that are evolving very rapidly inside Kubernetes. It has a lot of pieces that are still in the influx and hence some confusion. My goal as I hope in this article is to clarify it a bit and to give you a good starting point.
Kubernetes has two monitoring pipelines: (1) The core metrics pipeline, which is an integral part of Kubernetes and always installed with all distributions, and (2) The services monitoring (non-core) pipeline, which is a separate pipeline, and Kubernetes has no or limited dependency on. Keep reading to learn why :)
Sometimes is referred to as the resource metrics pipeline. The core monitoring pipeline is installed with every distribution. It provides enough details to other components inside the Kubernetes cluster to run as expected, such as the scheduler to allocate pods and containers, HPA and VPA to take proper decisions scaling pods.
The way it works is relatively simple:
A few helpful points
Services pipeline in abstract terms is relatively simple. Confusion usually comes from the plethora of services, agents that you can mix and match to get your pipeline up and running. Also, you can blame Heapster for that :)
Services Monitoring Pipeline consists of three main components: (1) Collection agent, (2) Metrics Server, and (3) Dashboards. I’ll not talk about alerting because it has lots of interesting twists :) I plan to discuss alerting in a separate article.
Below is the typical workflow, including most common components
A Few Notes:
Ideal Services pipeline depends on two main factors: (1) collection of relevant metrics, (2) Awareness of continuous changes inside kubernetes cluster.
A good pipeline should focus on collecting relevant metrics. There are plenty of agents that can collect OS and process-level metrics. But you will find very few out there that can collect details about containers running at a given node, such as the number of running containers, container state, docker engine metrics, etc. cAdvisor is the best agent IMO for this job so far.
Awareness of continuous changes means that the monitoring pipeline is aware of different pods, containers instances and can relate them to their parent entities, i.e. Deployment, Statefulsets, Namespace, etc. It also means that the metrics server is aware of system-wide metrics that should be visible to users, such as the number of pending pods, nodes status, etc.
What about Metrics Visualization?
You can visualize metrics in many different ways. The most common open source tool that easily integrates with Prometheus is Grafana. The challenges you will face though is building proper dashboards to monitor the right metrics. That said, you should have dashboards monitoring the following:
Note: You can get started with this Grafana template dashboards for Kubernetes.
Grafana is not best suited for alerting. I see a lot of teams depend on it to create alerting rules. However, it is not as reliable and comprehensive as Prometheus alerting manager.
Heapster is currently causing some confusion given that it is used to show both core pipeline metrics and services metrics. In reality, you can remove Heapster and nothing bad will happen to the core Kubernetes scheduling and orchestration scenarios. It was the default monitoring pipeline and I guess it still is the default in a lot of distributions. But you don’t have to use it at all.
So, the Kubernetes community wanted to make the separation clearer between core and services monitoring pipelines. Hence, Heapster will be deprecated and replaced by the Metrics Server (MS) as the main source of aggregated core metrics. Think of the MS as a trimmed down version of Heapster. Major immediate changes are: (1) No historical data or queries, (2) eliminating a lot of container-specific metrics, pod focus metrics only. Metrics Server is meant to provide core metrics that are needed for core Kubernetes scenarios, such as autoscaling, scheduling, etc., likely to take place in 2019 releases.
Infrastore will store Metric Server historical data with a support of simple SQL-like queries. It will support initially metrics collected by the Metrics Server. My guess, because Kubernetes community love extensibility, they will make it extensible and allow custom metrics to be added to the Metrics Server and its store.
Prevent Kubernetes NetworkPolicy misconfigurations by enforcing policy as code