What is The Singleton Pattern?
The Singleton pattern is one of the creational patterns used in software development. In Object-Oriented programming, a Singleton class refers to one that does not accept other instances to get created. The same concept holds for other domains than application development. For example, let’s assume that you are working on a Red Hat Linux box. Red Hat and its variants (Centos, Fedora, etc.) use yum for package and system updates. On a periodic basis, the operating system contacts its package repositories to download and update the packages that are currently installed. While this operation is in progress, no other yum processes should start. So, if a user tries to use yum to install a package, the system responds that “another installation is already in progress”. The yum process here may be referred to as the “singleton” process: only one instance of yum should be running at any given time. The Singleton pattern is used to maintain consistency and integrity. In the yum example, running multiple instances of the package manager concurrently may break the operating system or make it unstable.
How The Singleton Pattern is Applied
The end goal is to have only one instance of a process running. In our yum example, the singleton pattern is applied by creating a zero-byte file /var/run/yum.pid. Each time the yum command runs, it checks for the presence of this file and tries to “lock” it. If the file cannot be locked, this means that another yum process is using it and, therefore, the new instance does not start. Depending on the way an application logic is implemented, it responds. A yum command sleeps until the already-running process exits. Another application may just exit with a non-zero exit code denoting that concurrent execution is not allowed.
In the above scenario, the yum process is aware that it is (and should be) the only instance running. Another way of implementing the Singleton pattern is through a wrapper script. We could write a bash script that starts the yum process, but before it does, it checks whether or not another yum process is running (perhaps through a command like (ps -fe | grep yum). The difference here is that the application itself (yum) is not aware of the constraints imposed on it. It doesn’t know whether or not another version is running. Take note of this important implementation difference because we’ll apply the same logic to Kubernetes and discuss the use cases of using each method.
Implementing The Singleton Pattern in Kubernetes
In a Kubernetes cluster, the default behavior is to run and maintain several versions (replicas) of the application for high availability. A web application that runs on only one Nginx instance is vulnerable to downtime if this Nginx went down or got restarted. However, sometimes this may not best-serve your environment needs. In the microservices architecture, an application may be running on more than one component. If the application is hosted on Kubernetes, some of those components may need to follow the Singleton pattern when they run.
For example, a web application that needs to consume a message from a message queue in a sequential manner should not have more than one instance running at a time. Let’s see how we can implement the Singleton pattern in Kubernetes using the two methods that we described earlier: from within the application and from outside the application.
Using a ReplicaSet: Non-Aware Application
The simplest method that comes to mind when we need to run a Singleton Pod in Kubernetes is a ReplicaSet. The ReplicaSet controller ensures that a specific number of Pods are running. Setting the replicas count to 1 seems to do the trick. Let’s have an example:
apiVersion: apps/v1 kind: ReplicaSet metadata: name: frontend spec: replicas: 1 selector: matchLabels: tier: frontend template: metadata: labels: tier: frontend spec: containers: - name: php-redis image: gcr.io/google_samples/gb-frontend:v3
The above definition uses a ReplicaSet to spawn one Pod. The container running inside the Pod does not know that it is the only version running. It is not aware of the mechanism that prevents other instances from running. However, a ReplicaSet may not be your best option for implementing the Singleton pattern.
A ReplicaSet May Violate the Singleton Pattern
Because of the way ReplicaSets are designed, they ensure that at least a specific number of Pods are running. It does not strictly enforce a maximum number of running Pods at a given time. To better understand this limitation, let’s consider the situation when the Pod fails the readiness nor liveness probes. In this case, the ReplicaSet immediately spawns a new Pod on the same or on a different node. After some time, the failing Pod may come back to service. Now the matched Pod count is more than the replica count so the ReplicaSet kills one of them. However, even for a brief period of time, we had more than one Pod running. The Singleton pattern was violated.
Using a StatefulSet: a Non-Aware Application
ReplicaSets are easier to implement, yet they are more inclined towards keeping a highly-available state of the application rather than a consistent one. So, if your application requirements are very strict when following the Singleton pattern, you should use a StatefulSet instead. A StatefulSet is more strict about the order of the start and termination of its Pods. A given Pod cannot start unless its predecessor has already started. The termination process works in a similar way. Thus, a StatefulSet controller guarantees to abide by the Singleton pattern more than a ReplicaSet. However, StatefulSets have their own limitations:
- The volume associated with a Pod is not deleted automatically when the Pod is deleted. It has to be removed manually as a data protection mechanism. This adds some administrative burden.
- Due to their nature, StatefulSets need to use a headless Service. A headless Service does not expose an IP address. Rather, when you resolve a headless service’s DNS record, it returns a list of all the IP addresses of the Pods it manages. In a Singleton pattern that only uses one Pod, you may not need to use a service and call the Pod with its DNS name.
Again, the hosted application is not aware that only one instance of it is running and that additional instances are not allowed.
Implementing The Singleton Pattern Through The Application
You can also design the application itself so that only one instance (commonly known as the leader) us running at any given time. Other instances of the application are also running but they’re in passive mode. When in passive mode, the instance does not respond to client requests. Passive instances are constantly waiting for the leader to come down so that one of them gets elected to be the leader. There are many cluster applications that already implements this pattern like ZooKeeper, Redis, Consul, and Etcd.
Etcd is a key-value data store that is used by Kubernetes to maintain its state. The datastore must be replicated for high availability, but only one instance serves as the single source of truth for the cluster. Applications can use third-party data stores for this purpose like Consul but they can also use Etcd. A prominent example of this type of usage is Apache Camel which offers a connector to Etcd enabling the application to make use of Etcd’s locking mechanisms. No matter which leader-selection application you’re using, the concept remains the same: one instance of the application is active while the rest are inactive. When the active instance is down, the rest of the instances elect a new one to take over.
Do not Disrupt My Cluster! The Pod Disruption Budget Pattern
The Singleton pattern describes a scenario where you need only one instance of your application to be running at any given time. But sometimes, you may also want to ensure that a specific number (or percentage) of your Pods are not unavailable. Despite the existence of controllers like ReplicaSets, Deployments, etc. which ensure that all your Pods are healthy and running, in some scenarios, you are intentionally bringing down a portion of your Pods. The most well-known example of such a scenario is when you are draining a node. Draining refers to gradually removing Pods from a running node till all of them are down. Then, the node can be brought down for maintenance.
Beware, though, that Pod Disruption Budget policy only protects against voluntary Pod eviction. If Pods are evicted due to a node becoming unhealthy, PDB is not honored.
How Does Pod Disruption Work With Other Controllers?
Assume that you have a Deployment that spawns five Pod replicas. Now, you need to upgrade one of the cluster nodes. You know that your application can tolerate running with four Pods but no less than that. If one of the four Pods fails, the whole application breaks. This behavior is not uncommon in clustered applications that need to maintain a specific quorum of nodes. So, because of a kernel bug in one of the nodes, you installed a new kernel and you need to restart the node for the new kernel to be used. Restarting the node requires transferring any Pod running on the selected node to other nodes. Without using a Pod Disruption Budget (PDB), Kubernetes just kills all the Pods on the node. If those Pods are managed by a controller like a Deployment or a ReplicaSet, another Pod is automatically spawned and deployed to another healthy node. However, this breaks your application because the quorum is violated. With PDB, the eviction does not happen unless an additional Pod is running so that the minimum number of running Pods is maintained.
The following is an example of a Pod Disruption Policy that ensures that at least four Pods of the ones labeled app=mycluster are available at any given time:
apiVersion: policy/v1beta1 kind: PodDisruptionBudget metadata: name: mycluster-pdb spec: minAvailable: 4 selector: matchLabels: app: mycluster
As you can see, the definition starts with the usual data, apiVersion, kind, and metadata.name. The spec part has two important parameters:
- minAvailable (line 6): the required number of Pods that should be kept running at all times. This number only applied to the Pods matched through the Pod selector (line 7). Notice that you can use a percentage instead of a number. For example, 70%. Additionally, you can use maxUnavailable instead. The maxUnavailable specifies the highest number of Pods that can be missing. Notice that both parameters are mutually exclusive: you can only use one of them but not both.
- selector: works by searching for all Pods that satisfy a label. It is those Pods on which the PDB policy is applied.
- In many cases, you may need only one instance of your application to be running at a given time. For those application requirements, you should follow the Singleton deployment pattern.
- If your application can tolerate brief violations of the Singleton pattern, you can use a ReplicaSet with the number of replicas set to 1. The drawback of this approach is that ReplicaSets tries not to break the availability of the application, even if that means running more Pods than the requested number for a short time period. Possible causes of running another concurrent Pod is when the original Pod runs into issues making it temporarily unavailable, then comes back.
- If your application needs to follow a more strict Singleton policy, then you should use a StatefulSet. A StatefulSet does not attempt to spawn a new Pod unless the old one has completely shut down. However, StatefulSets have their own limitations.
- Both of the above approaches are not suitable for scaling. You only run one instance of your application on a single Pod and that’s it. So, if your application does other activities that do not require being a Singleton, but only part of it must run solo, you can use an application-managed locking mechanism. In this procedure, the application handles the election of a leader instance. Only the leader instance is allowed to perform the Singleton activity (for example, updating a DB record). When the instance fails, the rest of the instances elect a new leader.