If Kubernetes is the helmsman, Docker is the dock worker, then pods are the sailors of the ship (or cluster).
What Is a Pod in Kubernetes?
A pod is the smallest unit in a Kubernetes cluster. A pod may contain one or more containers. Pods can be scheduled (in Kubernetes terminology, scheduling means deploying) to a node. When you first learn about pods, you may think of them as containers, yet they aren’t. As a container, a pod is a self-contained logical process, with an isolated environment. It has its own IP address, storage, hostname...etc. However, a pod can host more than one container. So, a pod can be thought of as a container of containers.
When And Why Host More Than One Container In A Pod?
In the microservices architecture, each module should live in its own space and communicate with other modules following a set of rules. But, sometimes we need to deviate a little from this principle.
Suppose that you have an Nginx web server running - see below illustration. We need to analyze Nginx logs in real-time. The logs we need to parse are obtained from GET requests to the web server. The developers created a log watcher application that will do this job, and they built a container for it. In typical conditions, you’d have a pod for Nginx and another for the log watcher. However, we need to eliminate any network latency so that the watcher can analyze logs the moment they are available. A solution for this is to place both containers on the same pod.
Having both containers on the same pod allows them to communicate through the loopback interface as if they were two processes running on the same host. They also share the same storage volume.
Our First Pod
In order to work with this example, you’ll need a Kubernetes cluster running. Open your favorite text editor, create a new file called pods01.yaml and add the following to it:
apiVersion: v1 kind: Pod metadata: name: webserver spec: containers: - name: webserver image: nginx:latest ports: - containerPort: 80
Let’s first have a quick look at each line in this file:
- apiVersion: this is the version of the API used by the cluster. With new versions of Kubernetes being released, new functionality is introduced and, hence, new API versions may be defined. For the pod object, we use API version v1.
- Metadata: here we can define data about the object we are about to create. In this example, we only provide the name of the pod. But you can provide other details like the namespace.
- The spec part defines the characteristics that a given Kubernetes object should have. It is the cluster’s responsibility to update the status of the object to always match the desired configuration. In our example, the spec instructs that this object (the pod) should have one container with some attributes.
- containers: the containers part is an array where one or more container specs can be added. For example:
- The name that this container will have.
- The image on which it is based.
- The port(s) that will be open.
Let’s apply this configuration to the cluster. Issue the following command:
kubectl apply -f pods01.yaml
You should see an output similar to the following:
Kubectl is the client tool that is used to send API requests to Kubernetes. You could’ve used kubectl to achieve the same results using a command like the following:
kubectl run webserver --image=nginx --port=80
However, it is highly recommended that you use YAML (or JSON) files to build and configure your cluster. This will allow you to put your infrastructure under a version control system like Git.
Kubectl can be used for building, deleting, viewing, and updating Kubernetes resources. In fact, kubectl sends API requests to the cluster behind the scenes. So, technically you can get rid of the tool and issue all your cluster commands through an HTTP client like cURL. However, it is strongly recommended that you use kubectl; as it is far easier than sending raw HTTP requests.
Viewing Your Pods
One of the most repetitive tasks you’ll do as a K8s (k8s is the de facto abbreviation for Kubernetes) administrator is to view the status of your cluster resources. The following command allows you to view the pods running on your cluster:
kubectl get pods
The output should be similar to the following:
NAME READY STATUS RESTARTS AGE webserver 1/1 Running 0 19m
- Notice that the READY column is displaying 1/1, which means that this pod has only one container out of one, and it is ready.
- The status is running, which means that the pod is in service and it can receive and respond to requests.
- The restarts column represents how many times this pod was restarted. Kubernetes will attempt to restart a failing pod whenever it fails.
- The age represents how much time has passed since this pod was created.
Which Node Is This Pod Running On?
You’ll typically run Kubernetes on more than one node. The master node is responsible for scheduling pods on nodes. Kubernetes allows you to force a pod to be scheduled on a specific node. Let’s say you have recently purchased a powerful machine with a huge amount of memory, and you want your MongoDB pod(s) to always use this node in order to benefit from the increased RAM. But, is the pod running on this node as you wished? You can use the following command to make sure it is:
kubectl get pods -o wide
The output should resemble the following:
NAME READY STATUS RESTARTS AGE IP NODE webserver 1/1 Running 1 1h 10.1.0.4 docker-for-desktop
Now, we have the node name (I’m using Kubernetes on macOS). We also have the internal IP address of the pod.
I Need The Output In JSON
Like other Kubernetes components, kubectl was designed to be modular. You can have its output piped to other tools. Commonly, tools use JSON as their preferred communication language. Kubectl allows you to have the output of its subcommands in either YAML or JSON, with YAML being the default. Let’s have a quick example:
kubectl get pods -o json
The output is too long to be presented here. You can easily pipe it to a JSON parser like jq:
kubectl get pods -o json | jq
I hope that by now you come to appreciate the power of kubectl.
The Pod Is Failing But I Don’t Know Why
While the get pods subcommand may be useful most of the times, there’s a good chance that you’ll need more details about a pod. Let’s say that you ran
kubectl get pods and the output looked like the following:
NAME READY STATUS RESTARTS AGE webserver 0/1 ErrImagePull 0 42m
Later on, the output is something like:
NAME READY STATUS RESTARTS AGE webserver 0/1 CrashLoopBackOff 0 45m
The get pods subcommand shows that the pod is not ready. But, that’s it. To know what made the pod fail, let’s issue the following command:
kubectl describe pods webserver
The output will be kind of verbose, so I’m not going to show all of it here, only the important parts:
Normal Pulled 47m kubelet, docker-for-desktop Successfully pulled image "nginx:latest" Warning Failed 3m29s (x4 over 4m58s) kubelet, docker-for-desktop Failed to pull image "ngnx:latest": rpc error: code = Unknown desc = Error response from daemon: pull access denied for ngnx, repository does not exist or may require 'docker login'
It’s clear now that the pod was trying to pull an image that was misspelled (ngnx instead of the correct nginx).
kubectl describe pods will give you more than just the status of the image. It will print any logs that the pod produces, which is invaluable when troubleshooting a misbehaving pod.
Executing Commands Against Pods
If you’re a Docker user, you’d probably needed to execute commands against running containers, mostly for troubleshooting purposes. A Docker exec command may look like this:
docker exec -it webserver /bin/bash
The above command will immediately execute /bin/bash against the container, which - if bash is installed - will open a shell inside the container.
Kubectl has borrowed this functionality from Docker. You can issue a very similar command against the pod as follows:
kubectl exec -it webserver -- /bin/bash
Notice that we used the pod name not the container. Remember, pods host containers. We also used the -- to inform kubectl that we’re no longer adding any more subcommands or arguments and that the coming is the actual command that needs to be executed against the pod.
We mentioned before that a pod can contain multiple containers, so which container will kubectl execute bash against? Good question. Kubectl will select the first available container in the pod. If you want to run the command against a specific container in a multi-container pod, you can do this by adding
-c container name option. More on that later.
Terminating a Pod
By default, Kubernetes will restart a pod if it exits. Let’s see this in action:
$ kubectl exec -it webserver -- /bin/bash -c "kill 1"$ kubectl get pods NAME READY STATUS RESTARTS AGE webserver 0/1 Completed 2 3h
We killed Nginx, the main process that keeps the container running (hence, it has the pid of 1). When we check on the pods status we see that the status is no longer running. But, after a few seconds, we can see that the pod is in the running state again. The Age column is still reflecting 3 hours because Kubernetes did not recreate the pod. It only restarted it.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE webserver 1/1 Running 3 3h
But what if we really need the pod to go down? The following command will do the job:
kubectl delete pods webserver
A Kubernetes object configuration file can also be used to terminate an object just like it was used to create it:
kubectl delete -f pods01.yaml
Notice that here we didn’t specify pods as the type of resource that Kubernetes will act upon. This has been intelligently inferred by kubectl from the YAML file.
Both commands may have similar results. However, a YAML file can contain more than one resource. For example, you can specify a deployment, a secret, and a volume in the same YAML file. Calling kubectl against the file will automatically create or terminate all the objects contained in it.
Let’s Add A Second Container To The Pod
As a bonus, let’s demonstrate how a pod can host more than one container. Let’s create another YAML file. Call it pods02.yaml and add the following:
apiVersion: v1 kind: Pod metadata: name: webserver spec: containers: - name: webserver image: nginx:latest ports: - containerPort: 80 - name: webwatcher image: afakharany/watcher:latest
The only change we made here is adding a new container to the containers array. The name of the container now is more important than before; as it will be used whenever you need to execute commands against this specific container in the pod.
Again, let’s apply the configuration by issuing
kubectl apply -f pods02.yaml then
kubectl get pods. You should see an output close to the following:
NAME READY STATUS RESTARTS AGE webserver 2/2 Running 0 8s
Notice that the READY column now has two containers running out of two.
The second container features a parser that will count the number of ‘n’ characters in the HTML returned from the web-server’s home page. In a real-world case, this may be processing JSON output from an API, but let’s keep things simple. The Dockerfile used to create this second container is as follows:
FROM python:3.6 RUN pip install requests COPY watcher.py / ENTRYPOINT ["python", "/watcher.py"]
And the watcher.py file contents are:
import requests import time r = requests.get("http://127.0.0.1").text while True: num = r.count("n") print("There are " + str(num) + " occurrences of 'n'") time.sleep(5)
You don’t need any Python knowledge here. Just notice that the GET request (line 3) is directed to 127.0.0.1. This is the loopback interface, but it is shared by the web server and the watcher containers. Communicating with localhost provides as low network latency as possible.
Since we have two containers in a pod, we will need to use the -c option with kubectl when we need to address a specific container. For example:
kubectl exec -it webserver -c webwatcher -- /bin/bash
- Kubernetes pods are the foundational unit for all higher Kubernetes objects.
- A pod hosts one or more containers. It can be created using either a command or a YAML/JSON file.
- Use kubectl to create pods, view the running ones, modify their configuration, or terminate them. Kuberbetes will attempt to restart a failing pod by default.
- If the pod fails to start indefinitely, we can use the kubectl describe command to know what went wrong.
- Finally, we had a demonstration showing how and why a pod can host more than one container.