Kubernetes is a major, modern improvement in development, and databases are an essential part of the application. In this article, we’ll see how we can deploy a database in Kubernetes, and what approaches can we use to deploy a database in Kubernetes.
Databases are a system for storing and taking care of the data on a computer system. Database engines can create, read, update, and delete on the database. A database is controlled by a Database Management System (DBMS). In most databases, data is modeled in rows and columns and called relational, these types of databases typically became dominant in the ’80s. SQL is used for writing and querying data. In the 2000s, non-relational databases became popular, referred to as No-SQL because they used different query languages, and these kinds of databases worked on key-value pairs.
In this article, we’re going to deploy a database in Kubernetes, so we’ll have to be aware of what StatefulSet is - StatefulSet is the workload API object used to manage stateful applications. It manages the implementation and expansion of a set of Pods, and provides guarantees on the order and uniqueness of these Pods.
Like a deployment, it manages the pod that has an identical container specification. Pods that are maintained by StatefulSets have a unique, persistent identity and stable hostname regardless of which node they are on. If we want persistence across storage we can create a Persistence volume and use StatefulSet as a part of the solution. Although individual Pods in a StatefulSet are prone to failure, persistent Pod identifiers make it easier to match existing volumes to new Pods that replace any that have failed.
StatefulSets are valuable for applications that require one or more of the following:
- Stable, unique network identifiers.
- Stable, persistent storage.
- Ordered, graceful deployment and scaling.
- Ordered, automated rolling updates.
When deploying a database on Kubernetes we need to use StatefulSet, but some of the limitations of using StatefulSet are:
- Required use of persistent volume provisioner to provision storage for pod-based on request storage class.
- Deleting or scaling down the replicas will not delete the volume attached to StatefulSet. It ensures the safety of the data.
- StatefulSets currently require a Headless Service to be responsible for the network identity of the Pods.
- StatefulSet doesn’t provide any guarantee to delete all pods when StatefulSet is deleted, unlike deployment, which deletes all pods associated with deployment when the deployment is deleted. You have to scale down pod replicas to 0 prior deleting StatefulSet.
Databases On Kubernetes
We can deploy a database to Kubernetes as a stateful application. Usually, when we deploy pods they have their own storage, but that storage is ephemeral - if the container kills its storage, it’s gone with it.
So, we’ll have a Kubernetes object to tackle that scenario: when we want our data to persist we attach a pod with a respective persistent volume claim. By doing it this way, if our container kills our data, it will be in the cluster, and the new pod will access the data accordingly.
Pod -> PVC -> PV
- PV = Persistent Volume
- PVC = Persistent Volume Claim
Operators To Deploy Databases To Kubernetes
- We can deploy MySQL database using Kubernetes operator developed by Oracle:
- There’s also a PostgreSQL operator by Crunchydata to deploy PostgreSQL to Kubernetes:
- MongoDB owns an operator to deploy MongoDB Enterprise to a Kubernetes cluster:
Is It Feasible To Deploy The Database On Kubernetes?
In today’s world, there are more and more companies working on containerized technologies. Before doing a deep dive, let's review our options for running databases.
1. Fully Managed Databases
Fully managed databases are those that don’t have to provision or manage the database - this management can be done by cloud providers like AWS Google, Azure, or Digital Cloud. Managed databases include Amazon Web Services, Aurora DynamoDB, or Google Spanner and SQL. These databases are used because of a low-ops choice, cloud providers handle many of the maintenance tasks, such as backup, scaling patches, etc. You’ll just have to create a database to build the app, and let cloud providers handle the rest for you.
2. Deploying By Yourself On VM, Or On-premises Machines
With this option you can deploy the database to any virtual machine (EC2 or Compute Engine), and you’ll have full control. You’ll be able to deploy any version of the database, and you can set your own security and backup plans. On the other hand, this means that you'll manage, patch, scale, or provision the database on your own. You’ll also have to have an administrator in place, who will manage and administer your database. This will add cost to your infrastructure, but has the advantage of flexibility.
3. Run It On Kubernetes
Here’s the main point, deploying the database in Kubernetes is closer to the full-ops option, but you’ll get some benefits in terms of the automation that Kubernetes provides to keep the database application up and running. It’s important to remember that pods are ephemeral, so the possibility that the database application restarts or fails is greater. Also, you’ll be responsible for the more specific database administrative tasks such as backup, scaling, etc.
Some important points to consider when choosing to deploy the database on Kubernetes are:
- There are some custom resources and operators available to manage the database on Kubernetes.
- Databases that have caching layers and more transient storage are better fits for Kubernetes.
- You have to understand the replication mode available in the database. Asynchronous modes of replication leave room for data loss, because the transactions might be committed to the primary database, but not to the secondary databases.
Above, we have a simple chart to show what the decision tree looks like when deploying databases on Kubernetes. First, we try to understand if the database has Kubernetes-friendly features, such as MySQL or PostgreSQL, then we’ll have to find/plan for kubernetes operators to package the database with additional features. The second question is - how much workload is acceptable given what we’ve seen is needed to deploy a database in Kubernetes? Do we have a team of operation site engineers, or would we find it feasible to deploy the database on a Managed DB?
Already working in production with Kubernetes? Want to know more about kubernetes application patterns?
Deploy A Stateful Application On Kubernetes:
Step 1: Deploying The MySQL Service
apiVersion: v1 kind: Service metadata: name: mysql spec: ports: - port: 3306 selector: app: mysql clusterIP: None
First, we deploy the service for MySQL database on port 3306, with all pods having label key app and value MySQL.
Next, to create the following resource:
Kubectl create -f mysql_service.yaml
Step 2: Deploying The MySQL Deployment
apiVersion: apps/v1 kind: Deployment metadata: name: mysql spec: selector: matchLabels: app: mysql strategy: type: Recreate template: metadata: labels: app: mysql spec: containers: - image: mysql:5.6 name: mysql env: # Use secret in real usage - name: MYSQL_ROOT_PASSWORD value: password ports: - containerPort: 3306 name: mysql volumeMounts: - name: mysql-persistent-storage mountPath: /var/lib/mysql volumes: - name: mysql-persistent-storage persistentVolumeClaim: claimName: mysql-pv-claim
This deployment creates pods with image MySQL, with 5.6 tags, with an environment variable password on port 3306. We’ll also attach a persistent volume mysql-pv-claim which we’ll show in the upcoming steps.
To create the resource:
Kubectl create -f mysql_deployment.yaml
Step 3: Creating Our Persistent Volume
apiVersion: v1 kind: PersistentVolume metadata: name: mysql-pv-volume labels: type: local spec: storageClassName: manual capacity: storage: 20Gi accessModes: - ReadWriteOnce hostPath: path: "/mnt/data"
This creates a Persistent volume that we’ll use to attach the pod, to ensure data safety on restart. Persistent volume claims 20GB from storage with ReadWriteOne access mode. Host path is /mnt/data where all our data will reside.
To create the following resource:
Kubectl create -f persistence_volume.yaml
Step 4: Creating Our Persistent Volume Claim
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: mysql-pv-claim spec: storageClassName: manual accessModes: - ReadWriteOnce resources: requests: storage: 20Gi
This creates the Persistent volume claim, that claims 20GB from the Persistent volume we have created above, with the same access mode ReadWriteOnce as used above.
To create the following resource:
Kubectl create -f pvClaim.yaml
Step 5: Test The MySQL Database
kubectl run -it --rm --image=mysql:5.6 --restart=Never mysql-client -- mysql -h mysql -ppassword
This command creates a new Pod in the cluster running a MySQL client, and connects it to the server through the Service. If it connects, you know your stateful MySQL database is up and running.
Waiting for pod default/mysql-client-274442439-zyp6i to be running, status is Pending, pod ready: false If you don't see a command prompt, try pressing enter. mysql>
Feel free to clone our repository if you don’t want to write one by yourself, or want a quick walkthrough:
- Stateful applications are those which store the user’s state for the next session, the saved data is called application state.
- StatefulSet is a Kubernetes program that’s used to manage the stateful application, and provides guarantees about the ordering and uniqueness of the pods.
- By deleting Stateful Set, pods will not be deleted. Instead, you’ll have to scale down the Stateful application scale to 0.
- A database on Kubernetes is deployed with a persistent volume, which is used to persist data as long as your cluster is running. This means it withstands the destruction of the pod, and any new pod that’s created will start using the volume again.
- Fully managed databases are those managed by cloud providers by their site reliability engineers. We don’t have to provision or manage databases. These databases come with extra cost, but are the best option if you want to focus on your application, rather than spending your time on operations.
- You can deploy applications the traditional way via VM, or you can use cloud providers or on-premises administrators. Using the former, you’ll have to handle all of your database operations such as scaling, provisioning, and patching.
- Last point, we demonstrated deploying a database on Kubernetes, using persistent volume, to ensure data safety.