Weaveworks 2022.03 release featuring Magalix PaC | Learn more
Balance innovation and agility with security and compliance
risks using a 3-step process across all cloud infrastructure.
Step up business agility without compromising
security or compliance
Everything you need to become a Kubernetes expert.
Always for free!
Everything you need to know about Magalix
culture and much more
When discussing Kubernetes Storage, we’re talking about volumes. A Volume is the basic unit of storage in a Kubernetes cluster. But what is a volume and why does Kubernetes use it?
When you deploy a pod to the cluster, this pod may need to store data in files. Think of how a web server logs requests and errors to log files (for example, Apache uses the access.log and error.log files for this purpose). However, like pods, this storage is temporary. When the kubelet restarts the pod, or if you intentionally deleted and recreated the pod, any files created during the pod’s lifetime are automatically lost. Sometimes you don’t need any different behavior, and you are OK with losing the pod’s temporary files upon restart. However, some other times, you need to retain those files in subsequent pod restarts or even when the pod gets scheduled to a different node. To address this requirement, you must use a Kubernetes Volume.
A Volume is just a directory on a disk that is accessible to the containers inside a pod. Because Kubernetes was designed from the very start to work on multiple platforms and different hardware providers, there are many types of volumes that Kubernetes supports. Choosing the suitable volume type depends on the kind of environment you’re deploying the cluster at. A volume is attached to the pod, and all the containers inside this pod have access to the volume. This way, you can share data between different containers running on the same pod.
You define a volume type for the pod in the .spec.volumes part; then you grant the pod access to this volume through the .spec.containers.volumeMounts.
In the following sections, we discuss the different ways through which you can add a volume to a pod. We also demonstrate some of the most used volume types.
This is the simplest volume type. Once a pod is assigned to a node, an emptydir volume is created for it. The backing storage for this volume can be a physical file path on the node or even a portion of the node’s RAM. You cannot use this volume for storing data that you need to persist because emptydir volumes get cleared as soon as the pod is restarted. If you’re using the node’s memory as the storage medium, then restarting the node also deletes any data in the volume.
The most common use case for emptydir volumes is when you need to share data between containers on the same node. You need this data shared as long as the pod is running and you don’t care if this data was lost due to a pod restart. Another use case is when the application needs to cache some data when it is running. Such information is only needed while the application is running and can get safely discarded when the application is stopped or restarted.
The following pod definition demonstrates the emptydir volume type:
apiVersion: v1
kind: Pod
metadata:
name: webserver
spec:
containers:
- image: httpd
name: web-container
volumeMounts:
- mountPath: /cache
name: cache-volume
volumes:
- name: cache-volume
emptyDir: {}
If you need more performance, you can modify the above definition so that the cache is served from the node’s memory:
apiVersion: v1
kind: Pod
metadata:
name: webserver
spec:
containers:
- image: httpd
name: web-container
volumeMounts:
- mountPath: /cache
name: cache-volume
volumes:
- name: cache-volume
emptyDir:
medium: Memory
When you’re hosting your Kubernetes cluster on the cloud, each provider has its way of provisioning storage. From Elastic Block Storage (EBS) by AWS to Azure Disk by Microsoft Azure to GCE Persistent Disk by Google Cloud, the concept remains the same despite the different implementations.
In the following example, we attach a GCE Persistent Disk to a pod running in a Kubernetes cluster hosted on Google Cloud (the procedure can be equally replicated on other cloud providers using their own methods):
On Google Cloud, you can create a 1 GB Persistent Disk using a command like the following:
gcloud compute disks create --size=1GB --zone=us-central1-a data-vol
Next, we need to create the pod and instruct it to use our volume:
apiVersion: v1
kind: Pod
metadata:
name: webserver-pd
spec:
containers:
- image: httpd
name: webserver
volumeMounts:
- mountPath: /data
name: data-volume
volumes:
- name: data-volume
gcePersistentDisk:
pdName: data-vol
fsType: ext4
Create the pod using kubectl as follows:
kubectl apply -f webserver-pd.yaml
Let’s see our volume inside the pod:
$ kubectl exec -it webserver-pd -- bash
root@webserver-pd:/usr/local/apache2
df -h /data/
Filesystem Size Used Avail Use% Mounted on
/dev/sdb 976M 2.6M 958M 1% /data
As you can see, Kubernetes has formatted the disk using the EXT4 filesystem type and mounted it to /data.
We need to confirm that our data will persist pod restarts/deletions:
root@webserver-pd:/usr/local/apache2
# echo "Important data" > /data/myfile.txt
root@webserver-pd:/usr/local/apache2
# exit
$ kubectl delete pods webserver-pd
pod "webserver-pd" deleted
$ kubectl apply -f pod-vol.yaml
pod/webserver-pd created
$ kubectl exec -it webserver-pd -- bash
root@webserver-pd:/usr/local/apache2
# cat /data/myfile.txt
Important data
In the above lab, we create a new file on /data (the volume we mounted to the pod) and add some text to the file. Then, we delete and recreate the pod. When displaying the contents of the file on /data we find that the file exists with the same data that we added in the previous pod run.
Kubernetes supports several other volume types for a variety of environments — for example, the NFS volume type which is used when you have an NFS server. If you’re provisioning through iSCSI, you can use the iSCSI type. If your Kubernetes cluster is hosted on-prem in your data center, and you use SAN storage and fiber-channel connections, you can use the fc volume type. For a full list of the supported volumes, you can visit the official documentation page.
Let’s have a second example for a volume type that you can use if you are hosting the Kubernetes cluster on your laptop or if you don’t want to use the cloud-provisioned storage (perhaps because you’re just testing a scenario). The hostPath volume uses the local disk of the node to mount the volume. There are also some production use cases when you must use this volume type like when the pod needs access to the Docker engine on the node (by mounting /var/lib/docker).
The following pod definition creates a pod that has a mount point backed by a directory on the node disk. For this example, I’m using my local machine rather than a cloud environment to demonstrate how the files exist on both the node and are accessible by the pod:
apiVersion: v1
kind: Pod
metadata:
name: hostpath-pd
spec:
containers:
- image: nginx
name: hostpath-pd
volumeMounts:
- mountPath: /data
name: test-volume
volumes:
- name: test-volume
hostPath:
# directory location on host
path: /home/magalix/data
# this field is optional
type: Directory
Apply this definition to a Kubernetes cluster running on your local machine (like through minikube, microk8s, or Docker desktop if you’re on a mac). Alternatively, you can apply it to an external cluster as long as you can log in directly to the nodes.
The pod has access to /data, which points to a local directory on the node (/home/magalix/data). If you add any files to this directory, they will be automatically visible to the pod and vice versa.
In all the above examples, we’ve been using the volume through the pod definition. The volume type is tightly coupled to the pod. The definition must specify which kind of volume the pod shall use. This may be just fine for small to medium environments. However, as the scale grows, it becomes more challenging to handle which pods are using NFS, which need the cloud disk and which utilize iSCSI. For that reason, Kubernetes offers Persistent Volumes and Persistent Volume claims.
Persistent Volumes are used the same way as other volumes that we described earlier. But the pod needn’t care about how the storage shall be provisioned to it. The cluster administrator creates the volumes and pods can access them through Persistent Volume Claims, a level of abstraction between the volume and its storage mechanism.
When you use a Persistent Volume Claim, you only care about the requirements of your application. For example, how much space it needs and the access type. Access types define whether or not multiple nodes can access the volume. So, we have:
Those access modes largely depend on cloud provider capabilities. For example, some cloud providers do not support multiple nodes having read and write access to the same volume at the same time. So, in a typical environment, the cluster administrator creates several Persistent Volumes.
Let’s modify our previous pod that was using the hostPath volume type to use a Persistent Volume Claim instead. The node’s disk will still back the storage volume. But you’ll notice the level of abstraction that we can achieve.
We start by defining the Persistent Volume:
apiVersion: v1
kind: PersistentVolume
metadata:
name: myvol
spec:
storageClassName: manual
capacity:
storage: 5Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/home/magalix/data"
This definition creates a PV that offers 5 GB of space. Notice that storageClassName parameter, which is one of the ways a Persistent Volume Claim can find the matching Persistent Volume (more on that later).
Apply the above definition to the cluster by using:
kubectl apply -f pv.yaml
We need to create a Persistent Volume Claim that our pod can access:
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: myvol
spec:
storageClassName: manual
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
The PVC is searching for a PV with storageClassName “manual” and it is requesting a 1 GB storage from this volume. Apply this configuration to the cluster using:
kubectl apply -f pvc.yaml
Currently, you should have a PV and PVC created. You can view them by issuing the following commands respectively:
kubectl get pv # Get the Persistent Volumeskubectl get pvc # Get the Persistent Volume Claims
It’s time to modify our pod definition to use the Persistent Volume Claim instead of directly accessing the hostPath volume:
apiVersion: v1
kind: Pod
metadata:
name: hostpath-pd
spec:
containers:
- image: nginx
name: hostpath-pd
volumeMounts:
- mountPath: /data
name: test-volume
volumes:
- name: test-volume
persistentVolumeClaim:
claimName: myvol
It’s important to notice that we can’t apply this definition to an already-running pod, we need to delete the pod(s) first before doing so:
kubectl delete pods hostpath-pd
Now, we can create a new pod that uses our newly-added PVC:
kubectl apply -f pod-hostpath.yaml
Try and add some files to the directory on your local machine (or login to the node if this cluster is remotely hosted), open a shell to the container inside the pod and ensure that the files you added to changed are there.
Although we have the same result we had when we directly accessed the volumes through the pod definition, there are a couple of important advantages here:
We can take volume provisioning even one step further where you don't need to pre-provision a disk with specific storage. With Dynamic Volume Provisioning, you only need to define a Storage Class for each volume type you intend to provide to the cluster. For example, your cloud provider may have different disk types: HDD for operations that do not require speedy I/O and SSD for faster reads and writes. Let’s have an example. The following Dynamic Volume definition creates a Storage Class named SSD that maps to GCE’s solid-state disk type:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ssd
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-ssd
This definition is specific to Google Cloud. However, there are similar possible definitions for other cloud providers as well as on-prem environments (for example, vSphere). The full list can be found here. You can choose different features that the provider offers (for example, I/O speed) and choose the appropriate class name that conveys what this volume provides.
Now that we have a volume, we can deploy a pod and a Persistent Volume Claim that use this volume:
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: gce-claim
spec:
accessModes:
- ReadWriteOnce
storageClassName: ssd
resources:
requests:
storage: 10Gi
---
apiVersion: v1
kind: Pod
metadata:
name: webserver-pd
spec:
containers:
- image: httpd
name: webserver
volumeMounts:
- mountPath: /data
name: dynamic-volume
volumes:
- name: dynamic-volume
persistentVolumeClaim:
claimName: gce-claim
By applying the above configuration, you are creating a Persistent Volume Claim the provides a 10 GB disk to the container inside the pod. The claim searches for a Storage Class with the name you specified and uses it to acquire the requested space.
Now, if you open a shell to the container, you can see that we have a 10 GB filesystem mounted to /data.
Empower developers to delivery secure and compliant software with trusted application delivery and policy as code. Learn more.
Automate your deployments with continuous application delivery and GitOps. Read this blog to learn more.
This article explains the differences between hybrid and multi-cloud model and how GitOps is an effective way of managing these approaches. Learn more.
Implement the proper governance and operational excellence in your Kubernetes clusters.
Comments and Responses