A set of practices that combines software development and Information Technology operations. Earlier, these responsibilities were distributed among two teams: one that owned the development cycle, another for operation management.
The DevOps conceptual practice forces us to break significant problems into smaller problems. Microservices fit perfectly here, where small services build-up a component, and these components make up an application. The microservice architecture utilizes small teams to develop functional components one by one. By using a microservice architecture, it’s easy to market faster, scale up and down without impacting the whole system. It’s also better for fault tolerance and is platform and language agnostic. Everything has pros and cons when it comes to microservice architecture though - in this case, it’s harder to maintain testing and monitoring.
As you know, in microservice architecture, we deploy microservices as a container and for orchestration of containers, we rely heavily on tools like Kubernetes or Docker Swarm. Once deployed, microservice architecture has thousands of services talking to each other through networking, which can make it very challenging to monitor. We have to monitor independent services more than service to service communication. Thanks to the massive community behind the monitoring tools, it’s easier to monitor the cluster.
Next, we’ll discuss Kubernetes monitoring tools, with overviews for each.
Kubernetes Monitoring Tools
The following are some of the more popular tools used to monitor Kubernetes clusters that we’ll take a look at:
- ELK Elastic stack
Prometheus is an open-source event monitoring tool for containers or microservices. Prometheus gathers time-series-based numerical data. Prometheus server works by scraping its data for you. This invokes the metrics endpoint of the various nodes that have been configured to monitor. These metrics collect at regular timestamps and are stored locally. The endpoint that has been used to scrape is exposed in the node.
1. Prometheus Data Retention
By default, the data retention period is 15 days and the lowest supported retention period is 2 hours. Bear in mind, the higher the retention period, the larger the amount of storage is required. Also, the lowest retention period can be used when configuring remote storage for Prometheus.
2. Prometheus Server
Prometheus has a central, main component called the Prometheus server. The Prometheus server is a thing that monitors a particular thing. Prometheus server could monitor an entire Linux server, a stand-alone Apache server, a single process, a database service, or some other system unit that you want it to monitor.
3. Prometheus Target
Prometheus servers monitor the target, and targets can refer to an array of things. It could be a single server, or target for probing of multiple endpoints. The CPU memory of these units can be used as a metric. Prometheus server collects these metrics from specific targets and stores them in a time-series database. The targets to scrape and the time interval for scraping are defined in the YAML file.
4. Prometheus With Grafana
Grafana is a multi-platform visualization software that’s been available since 2014. It provides us with a graph, essentially a chart web-connected to the data source. Prometheus has its own built-in browser expression, but Grafana is the industry’s most powerful visualization software and has out of the box integration with Prometheus.
5. Prometheus In Kubernetes
Prometheus represents data in key-value pairs, and is how Kubernetes organizes infrastructure metadata using labels. Metrics are human-readable, in a self-explanatory format, and are published over HTTP transport. You can check that the metrics are correctly exposed just by using your web browser, or use Grafana for more powerful visualization.
Grafana is a multi-platform visualization software available since 2014. Grafana provides us with a graph, a chart that’s web-connected to the data source. It can query, or visualize your data source, and it doesn’t matter where the data is stored.
Swift and extensible client-side graphs with a number of options. There are also many plugins to expand your options further and help visualize any desired metrics and logs.
2. Dynamic Dashboards
Create dynamic and reusable dashboards with template variables that appear as dropdowns at the top of your dashboard.
3. Explore Metrics
Explore your data through ad-hoc queries. Split view and compare different time ranges, queries, and data sources.
4. Explore Logs
Experience the magic of switching from metrics to logs with preserved label filters. Quickly search through all your logs or stream them live.
Grafana has a built-in alerting engine, it allows users to attach conditions to a metric, and when these metrics meet certain conditions, it alerts you via social communication, chat tools (e.g., Slack), email, and custom webhook.
6. Mixed Data Sources
You can use multiple data sources, even custom data sources, on a query basis. Grafana supports and is used for monitoring and analyzing CPU, storage, and memory metrics, etc.
"Are you facing monitoring issues? get a bird's eye view & Understand infrastructure trends drill down into the right level of details with just 2 clicks.
Fluentd is an open-source project used as a unified logging layer and is a member project of cloud-native computing foundation (CNCF). Logs are important in the cluster -- from logs you’ll be able to understand what’s happening inside your instance. Logs have to be collected from multiple sources, and Fluentd provides an easy solution for a centralized logging system. Fluentd runs on approx 40mb of memory, and it can process 10000 events per second.
1. Fluentd with Kubernetes
Fluentd is a standard log aggregator for Kubernetes. Fluentd has its own docker image and edge for testing. Fluentd is the 8th most used image on DockerHub. Fluentd has to be running on each node of the cluster and Kubernetes provides an object Daemon set that’s used to deploy one service to run on each node of the cluster.
2. Use Case
Centralizing Apache/Nginx Log: Fluentd used to access, or error log, and shift them to the remote server
Syslog Alerting: Fluentd can "grep" for events and send out alerts
Mobile/Web Application Logging: Fluentd can be used as middleware to enable asynchronous, scalable logging for user action events
Jaeger is an implementation Kubernetes operator and a distributed tracing platform. An Operator is a method of packaging, deploying, and managing a Kubernetes application. The Jaeger operator can be installed on Kubernetes-based clusters and can search for new Jaeger custom resources (CR) in specific namespaces or across the entire cluster. Typically, there’s only one Jaeger Operator per cluster, but there can be a maximum of one Jaeger Operator per namespace in multi-tenant scenarios. When a new Jaeger CR is detected, an operator will try to establish itself as the owner of the resource, setting a jaegertracing.io/operated-by tag on the new CR, with the namespace and operator name as the value of the tag. Jaeger can be run as a sidecar container, or yet another approach is to use it as Daemon Set.
1. Distributed Tracing
Microservice architecture is so vast, and in this architecture, there are many calls going outside the cluster and many calls inside to the services. Jaeger easily allows us to trace calls from users to services. It also enables us to track application latency, trace the lifecycle of network calls, and also identify performance issues.
Best Practices To Monitor Your Cluster
It’s difficult to monitor distributed systems, and when working Kubernetes or microservices architecture, we’re dealing with a comprehensive, distributed system. This system spans multiple nodes, and multiple services are responsible for single outputs, which makes it hard to monitor the entire system.
We can’t monitor every node or pod by manually logging and retrieving its metric -- so, here are some practices to follow to make the most of monitoring:
1. Use DaemonSets
DaemonSet is the Kubernetes object used to deploy pods on each node of the cluster. DaemonSet can be used by multiple monitoring software/apps like Fluentd, Jaeger, or Appdynamic agent. Like in Jaeger, jaeger agent can be deployed as a DaemonSet to trace calls and services. This way, users can easily gather data from all the nodes in the cluster.
2. Tags And Labels
Tags and labels are used for filtering objects in Kubernetes and used to interact with Kubernetes objects such as pods, jobs, or cron jobs. It can help make your metrics more useful for debugging.
3. Use Service Discovery
Kubernetes deploys services according to scheduling policies, we don’t know where or which node our app will be deployed on. You'll want to use a monitoring system with service discovery, which automatically adapts metric collection to mobile containers. This will allow you to continuously monitor your applications without interruption.
The most complex issues occur within the Kubernetes cluster -- this can be the result of DNS bottlenecks, network overload, and, the sum of all fears – Etcd. It’s critical to track the degradation of master nodes and identify issues before they happen, particularly load average, memory, and disk size. We need to monitor Kube-system patterns as closely as possible.
5. Constantly Watch For High Disk Usage
There is no automatic healing within our stateful set, so High Disk Usage always requires attention. Make sure to monitor all disk and root volume systems. Kubernetes Node Exporter provides a highly recommended, nice metric for tracking these devices.
Already working in production with Kubernetes? Want to know more about kubernetes application patterns?
- DevOps is a set of practices that combines the work of Developer and Operations teams
- We’ve seen many recommended tools that adhere to DevOps best practices.
- When we talk about DevOps microservices containers, Kubernetes is the leading tool that helps us orchestrate containers on the fly.
- Microservices architecture breaks larger problems into smaller ones. These small microservices make up the application.
- Prometheus is a tool to monitor containers. Prometheus server uses a scraping principle to get data from the Kubernetes cluster.
- Grafana is a leading visualization tool, it gathers data from multiple sources, and shows users a centralized visual. Data sources are used to monitor the Kubernetes cluster.
- Fluentd is an open-source project that’s used as a unified logging layer and is a member project of a cloud-native computing foundation (CNCF).
- Jaeger is an implementation Kubernetes operator and a distributed tracing platform. It’s used to trace calls initiating from users to services, and shows us problems and where they occur. It’s deployed as a DaemonSet or sidecar container.