Balance innovation and agility with security and compliance
risks using a 3-step process across all cloud infrastructure.
Step up business agility without compromising
security or compliance
Everything you need to become a Kubernetes expert.
Always for free!
Everything you need to know about Magalix
culture and much more
Kubernetes is a major, modern improvement in development, and databases are an essential part of the application. In this article, we’ll see how we can deploy a database in Kubernetes, and what approaches can we use to deploy a database in Kubernetes.
Databases are a system for storing and taking care of the data on a computer system. Database engines can create, read, update, and delete on the database. A database is controlled by a Database Management System (DBMS). In most databases, data is modeled in rows and columns and called relational, these types of databases typically became dominant in the ’80s. SQL is used for writing and querying data. In the 2000s, non-relational databases became popular, referred to as No-SQL because they used different query languages, and these kinds of databases worked on key-value pairs.
In this article, we’re going to deploy a database in Kubernetes, so we’ll have to be aware of what StatefulSet is - StatefulSet is the workload API object used to manage stateful applications. It manages the implementation and expansion of a set of Pods, and provides guarantees on the order and uniqueness of these Pods.
Like a deployment, it manages the pod that has an identical container specification. Pods that are maintained by StatefulSets have a unique, persistent identity and stable hostname regardless of which node they are on. If we want persistence across storage we can create a Persistence volume and use StatefulSet as a part of the solution. Although individual Pods in a StatefulSet are prone to failure, persistent Pod identifiers make it easier to match existing volumes to new Pods that replace any that have failed.
StatefulSets are valuable for applications that require one or more of the following:
When deploying a database on Kubernetes we need to use StatefulSet, but some of the limitations of using StatefulSet are:
We can deploy a database to Kubernetes as a stateful application. Usually, when we deploy pods they have their own storage, but that storage is ephemeral - if the container kills its storage, it’s gone with it.
So, we’ll have a Kubernetes object to tackle that scenario: when we want our data to persist we attach a pod with a respective persistent volume claim. By doing it this way, if our container kills our data, it will be in the cluster, and the new pod will access the data accordingly.
Pod -> PVC -> PV
In today’s world, there are more and more companies working on containerized technologies. Before doing a deep dive, let's review our options for running databases.
Fully managed databases are those that don’t have to provision or manage the database - this management can be done by cloud providers like AWS Google, Azure, or Digital Cloud. Managed databases include Amazon Web Services, Aurora DynamoDB, or Google Spanner and SQL. These databases are used because of a low-ops choice, cloud providers handle many of the maintenance tasks, such as backup, scaling patches, etc. You’ll just have to create a database to build the app, and let cloud providers handle the rest for you.
With this option you can deploy the database to any virtual machine (EC2 or Compute Engine), and you’ll have full control. You’ll be able to deploy any version of the database, and you can set your own security and backup plans. On the other hand, this means that you'll manage, patch, scale, or provision the database on your own. You’ll also have to have an administrator in place, who will manage and administer your database. This will add cost to your infrastructure, but has the advantage of flexibility.
Here’s the main point, deploying the database in Kubernetes is closer to the full-ops option, but you’ll get some benefits in terms of the automation that Kubernetes provides to keep the database application up and running. It’s important to remember that pods are ephemeral, so the possibility that the database application restarts or fails is greater. Also, you’ll be responsible for the more specific database administrative tasks such as backup, scaling, etc.
Some important points to consider when choosing to deploy the database on Kubernetes are:
Above, we have a simple chart to show what the decision tree looks like when deploying databases on Kubernetes. First, we try to understand if the database has Kubernetes-friendly features, such as MySQL or PostgreSQL, then we’ll have to find/plan for kubernetes operators to package the database with additional features. The second question is - how much workload is acceptable given what we’ve seen is needed to deploy a database in Kubernetes? Do we have a team of operation site engineers, or would we find it feasible to deploy the database on a Managed DB?
apiVersion: v1 kind: Service metadata: name: mysql spec: ports: - port: 3306 selector: app: mysql clusterIP: None
First, we deploy the service for MySQL database on port 3306, with all pods having label key app and value MySQL.
Next, to create the following resource:
Kubectl create -f mysql_service.yaml
apiVersion: apps/v1 kind: Deployment metadata: name: mysql spec: selector: matchLabels: app: mysql strategy: type: Recreate template: metadata: labels: app: mysql spec: containers: - image: mysql:5.6 name: mysql env: # Use secret in real usage - name: MYSQL_ROOT_PASSWORD value: password ports: - containerPort: 3306 name: mysql volumeMounts: - name: mysql-persistent-storage mountPath: /var/lib/mysql volumes: - name: mysql-persistent-storage persistentVolumeClaim: claimName: mysql-pv-claim
This deployment creates pods with image MySQL, with 5.6 tags, with an environment variable password on port 3306. We’ll also attach a persistent volume mysql-pv-claim which we’ll show in the upcoming steps.
To create the resource:
Kubectl create -f mysql_deployment.yaml
apiVersion: v1 kind: PersistentVolume metadata: name: mysql-pv-volume labels: type: local spec: storageClassName: manual capacity: storage: 20Gi accessModes: - ReadWriteOnce hostPath: path: "/mnt/data"
This creates a Persistent volume that we’ll use to attach the pod, to ensure data safety on restart. Persistent volume claims 20GB from storage with ReadWriteOne access mode. Host path is /mnt/data where all our data will reside.
To create the following resource:
Kubectl create -f persistence_volume.yaml
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: mysql-pv-claim spec: storageClassName: manual accessModes: - ReadWriteOnce resources: requests: storage: 20Gi
This creates the Persistent volume claim, that claims 20GB from the Persistent volume we have created above, with the same access mode ReadWriteOnce as used above.
To create the following resource:
Kubectl create -f pvClaim.yaml
kubectl run -it --rm --image=mysql:5.6 --restart=Never mysql-client -- mysql -h mysql -ppassword
This command creates a new Pod in the cluster running a MySQL client, and connects it to the server through the Service. If it connects, you know your stateful MySQL database is up and running.
Waiting for pod default/mysql-client-274442439-zyp6i to be running, status is Pending, pod ready: false If you don't see a command prompt, try pressing enter. mysql>
Feel free to clone our repository if you don’t want to write one by yourself, or want a quick walkthrough:
Prevent Kubernetes NetworkPolicy misconfigurations by enforcing policy as code