What is a Daemon?
A daemon is a process that runs in the background. Typically, a daemon does not produce visible output to the user; it does not accept inputs. Daemons exist to perform background jobs. For example, in Linux, we have the httpd to respond to HTTP requests, sshd to grant remote users secure remote-shell access. We also have several kernel daemons that do not accept users input; they exist to perform housekeeping and other essential tasks that the kernel needs to function correctly. Sometimes, users may create or install their demons. For example, logrotated is a popular Linux daemon that routinely archives old log files in configurable paths according to user-defined settings. Another example is log shippers (filebeat, fluentd,etc.) that periodically send logs to a log aggregation service like ELK stack for analysis and correlation.
Do We Need Daemons in a Kubernetes Cluster?
Kubernetes is often referred to as the “data center operating system.” As we just discussed, an operating system needs daemons to perform background jobs that users do not (and should not) interact with. So, in Kubernetes, higher-level controllers like Deployments need to continually monitor the number of running Pods so that it spawns or kills Pods as required. Such a task needs to run through a daemon: a background process that needs no user interaction, it’s always running and is chiefly managed by the Kubernetes engine itself. Kubernetes administrators may also need daemons to execute tasks on the running nodes. For that purpose, Kubernetes offers the DaemonSet resource. Like a Deployment, a ReplicaSet, or a StatefulSet, a DaemonSet creates and manages Pods. However, those Pods are configured so that they run on all the cluster nodes.
Why Not Just Install The Daemon On The Node itself?
Because that’s not how the cluster works, a daemon that’s directly installed on a node is not managed by Kubernetes, it is, instead, controlled by and reports to the node’s operating system. Any changes that you need to make to the daemon configuration need to be performed on every node. If the daemon stops working or reports errors, Kubernetes does nothing for you. You need to configure the OS or some third-party tool to restart the daemon if it fails. But, wasn’t Kubernetes designed for that sort of tasks? That’s why you’d be much better off using a DaemonSet.
Example: Fluentd Daemonset For Log Collection
The following is a stripped-down version of the official fluentd daemonset definition file:
apiVersion: extensions/v1beta1 kind: DaemonSet metadata: name: fluentd namespace: kube-system labels: k8s-app: fluentd-logging version: v1 spec: template: metadata: labels: k8s-app: fluentd-logging version: v1 spec: nodeSelector: env: prod containers: - name: fluentd image: fluent/fluentd-kubernetes-daemonset:elasticsearch volumeMounts: - name: varlog mountPath: /var/log - name: varlibdockercontainers mountPath: /var/lib/docker/containers readOnly: true terminationGracePeriodSeconds: 30 volumes: - name: varlog hostPath: path: /var/log - name: varlibdockercontainers hostPath: path: /var/lib/docker/containers
Let’s have a closer look at the key components of this definition:
- Lines 1-8 specify the usual details that any Kubernetes controller has: the API version that recognizes this controller, the controller type (DaemonSet), in addition to the metadata that features the name and the labels that this controller holds.
- Lines 9-14: We define the labels that the spawned Pods will have.
- Lines 15-17: This DaemonSet needs to run only on the production nodes (labeled env=prod). The nodeSelector helps us with this by selecting only nodes with the env=prod label for running the Pods.
- Lines 18-20 define the container details that this Pod uses. The container runs the fluentd image and is titled fluentd.
- Lines 21-26 specify the volumes that the container has access to. DaemonSets are typically used for running tasks on the nodes rather than the application in general. Hence, we have two volumes here:
- The /var/log volume gives fluentd access to the system logs that are typically found in this path on Linux systems. You should change this path if you want to collect different logs or your system stores its logs in some other location.
- The /var/lib/docker/containers gives fluentd access to the containers on the host and their logs as well.
- Lines 27-the end of the file define the volume sources that the container use. Typically, a DaemonSet uses the hostPath volume type to reach files and directories on the node.
How are DaemonSets Different From ReplicaSets?
DaemonSets and ReplicaSets may look very similar. However, they have several significant differences:
- DaemonSets runs one Pod on every node. This can be limited using nodeSelector or taints and tolerations depending on your required scenario. A ReplicaSet works by selecting the most suitable node(s) to run the Pods. It doesn’t guarantee that every node has on running Pod.
- DaemonSets Pods do not need a scheduler to run. Each Pod has the nodeName parameter already specified. This makes DaemonSets ideal for running the Kubernetes system daemons.
- Pods created by DaemonSets are treated differently by Kubernetes. For example, they have higher priorities than the rest of the Pods; the descheduler does not evict it, and so on.
DaemonSet Pods Access Patterns
Most of the time, you don’t need to communicate with Pods spawned by DaemonSets. Those Pods are mainly used for background tasks, housekeeping, log aggregation, and so on. But what if you do want to send an HTTP request to the Pod and examine the response? For example, a custom log-collection Pod may have a health or status endpoint that displays information about the number of logs it already processed, whether there were errors...etc. Let’s explore the different options that you have:
- Create a traditional Service and set the Pod selector the same as the one used by the DaemonSet. The drawback of this approach is that you always get a response from a random node.
- Create a Headless Service that uses the same Pod selector as the DaemonSet but does not expose an IP address. The headless service returns a list of IP addresses of the Pods it matches. It is up to you (or the client software you are using) to parse the response and select the appropriate Pod to communicate with.
- Use the hostPort option with the DaemonSet Pods. Using hostPort, you make the Pod accessible through the node’s IP address. There is no service layer in between. This approach has a significant drawback because you are limited to the availability of the port on the node. You can use this method in small or development environments.
- When you use Kubernetes, you need to change the way you are looking at your infrastructure. Since your applications - with all their aspects - are running inside the cluster with Kubernetes managing it, there should be little to no coupling with the nodes. In another article, we discussed the CronJob controller, which allows you to run jobs on a periodic schedule. In this article, we explained how you could run daemons that are usually installed and run on nodes through DaemonSets.
- A DaemonSet looks and behaves a lot like a ReplicaSet, it spawns Pods and ensures that they’re always running. However, DaemonSets differ in that they deploy their Pods to every node in the cluster (unless you use a nodeSelector or taints and tolerations). Additionally, Pods that are created by DaemonSets are regarded differently. They are not evicted from the nodes, and they have higher priorities.
- The most popular usage scenario for DaemonSets is log-collection. Many log-aggregation products offer DaemonSet templates for running their client daemons on the nodes through DaemonSets.