Weaveworks 2022.03 release featuring Magalix PaC | Learn more
Balance innovation and agility with security and compliance
risks using a 3-step process across all cloud infrastructure.
Step up business agility without compromising
security or compliance
Everything you need to become a Kubernetes expert.
Always for free!
Everything you need to know about Magalix
culture and much more
If Kubernetes is the helmsman, Docker is the dock worker, then pods are the sailors of the ship (or cluster).
A pod is the smallest unit in a Kubernetes cluster. A pod may contain one or more containers. Pods can be scheduled (in Kubernetes terminology, scheduling means deploying) to a node. When you first learn about pods, you may think of them as containers, yet they aren’t. As a container, a pod is a self-contained logical process, with an isolated environment. It has its own IP address, storage, hostname...etc. However, a pod can host more than one container. So, a pod can be thought of as a container of containers.
In the microservices architecture, each module should live in its own space and communicate with other modules following a set of rules. But, sometimes we need to deviate a little from this principle.
Suppose that you have an Nginx web server running - see below illustration. We need to analyze Nginx logs in real-time. The logs we need to parse are obtained from GET requests to the web server. The developers created a log watcher application that will do this job, and they built a container for it. In typical conditions, you’d have a pod for Nginx and another for the log watcher. However, we need to eliminate any network latency so that the watcher can analyze logs the moment they are available. A solution for this is to place both containers on the same pod.
Having both containers on the same pod allows them to communicate through the loopback interface as if they were two processes running on the same host. They also share the same storage volume.
In order to work with this example, you’ll need a Kubernetes cluster running. Open your favorite text editor, create a new file called pods01.yaml and add the following to it:
apiVersion: v1
kind: Pod
metadata:
name: webserver
spec:
containers:
- name: webserver
image: nginx:latest
ports:
- containerPort: 80
Let’s first have a quick look at each line in this file:
Let’s apply this configuration to the cluster. Issue the following command:
kubectl apply -f pods01.yaml
You should see an output similar to the following:
pod/webserver created
Kubectl is the client tool that is used to send API requests to Kubernetes. You could’ve used kubectl to achieve the same results using a command like the following:
kubectl run webserver --image=nginx --port=80
However, it is highly recommended that you use YAML (or JSON) files to build and configure your cluster. This will allow you to put your infrastructure under a version control system like Git.
Kubectl can be used for building, deleting, viewing, and updating Kubernetes resources. In fact, kubectl sends API requests to the cluster behind the scenes. So, technically you can get rid of the tool and issue all your cluster commands through an HTTP client like cURL. However, it is strongly recommended that you use kubectl; as it is far easier than sending raw HTTP requests.
One of the most repetitive tasks you’ll do as a K8s (k8s is the de facto abbreviation for Kubernetes) administrator is to view the status of your cluster resources. The following command allows you to view the pods running on your cluster:
kubectl get pods
The output should be similar to the following:
NAME READY STATUS RESTARTS AGE
webserver 1/1 Running 0 19m
You’ll typically run Kubernetes on more than one node. The master node is responsible for scheduling pods on nodes. Kubernetes allows you to force a pod to be scheduled on a specific node. Let’s say you have recently purchased a powerful machine with a huge amount of memory, and you want your MongoDB pod(s) to always use this node in order to benefit from the increased RAM. But, is the pod running on this node as you wished? You can use the following command to make sure it is:
kubectl get pods -o wide
The output should resemble the following:
NAME READY STATUS RESTARTS AGE IP NODE
webserver 1/1 Running 1 1h 10.1.0.4 docker-for-desktop
Now, we have the node name (I’m using Kubernetes on macOS). We also have the internal IP address of the pod.
Like other Kubernetes components, kubectl was designed to be modular. You can have its output piped to other tools. Commonly, tools use JSON as their preferred communication language. Kubectl allows you to have the output of its subcommands in either YAML or JSON, with YAML being the default. Let’s have a quick example:
kubectl get pods -o json
The output is too long to be presented here. You can easily pipe it to a JSON parser like jq:
kubectl get pods -o json | jq
I hope that by now you come to appreciate the power of kubectl.
While the get pods subcommand may be useful most of the times, there’s a good chance that you’ll need more details about a pod. Let’s say that you ran kubectl get pods
and the output looked like the following:
NAME READY STATUS RESTARTS AGE
webserver 0/1 ErrImagePull 0 42m
Later on, the output is something like:
NAME READY STATUS RESTARTS AGE
webserver 0/1 CrashLoopBackOff 0 45m
The get pods subcommand shows that the pod is not ready. But, that’s it. To know what made the pod fail, let’s issue the following command:
kubectl describe pods webserver
The output will be kind of verbose, so I’m not going to show all of it here, only the important parts:
Normal Pulled 47m kubelet, docker-for-desktop Successfully pulled image "nginx:latest"
Warning Failed 3m29s (x4 over 4m58s) kubelet, docker-for-desktop Failed to pull image "ngnx:latest": rpc error: code = Unknown desc = Error response from daemon: pull access denied for ngnx, repository does not exist or may require 'docker login'
It’s clear now that the pod was trying to pull an image that was misspelled (ngnx instead of the correct nginx).
kubectl describes pods will give you more than just the status of the image. It will print any logs that the pod produces, which is invaluable when troubleshooting a misbehaving pod.
If you’re a Docker user, you’d probably need to execute commands against running containers, mostly for troubleshooting purposes. A Docker exec command may look like this:
docker exec -it webserver /bin/bash
The above command will immediately execute /bin/bash against the container, which - if bash is installed - will open a shell inside the container.
Kubectl has borrowed this functionality from Docker. You can issue a very similar command against the pod as follows:
kubectl exec -it webserver -- /bin/bash
Notice that we used the pod name, not the container. Remember, pods host containers. We also used the -- to inform kubectl that we’re no longer adding any more subcommands or arguments and that the coming is the actual command that needs to be executed against the pod.
We mentioned before that a pod can contain multiple containers, so which container will kubectl execute bash against? Good question. Kubectl will select the first available container in the pod. If you want to run the command against a specific container in a multi-container pod, you can do this by adding -c
the container name option. More on that later.
By default, Kubernetes will restart a pod if it exits. Let’s see this in action:
$ kubectl exec -it webserver -- /bin/bash -c "kill 1"$
kubectl get pods
NAME READY STATUS RESTARTS AGE
webserver 0/1 Completed 2 3h
We killed Nginx, the main process that keeps the container running (hence, it has the pid of 1). When we check on the pods status we see that the status is no longer running. But, after a few seconds, we can see that the pod is in the running state again. The Age column is still reflecting 3 hours because Kubernetes did not recreate the pod. It only restarted it.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
webserver 1/1 Running 3 3h
But what if we really need the pod to go down? The following command will do the job:
kubectl delete pods webserver
A Kubernetes object configuration file can also be used to terminate an object just like it was used to create it:
kubectl delete -f pods01.yaml
Notice that here we didn’t specify pods as the type of resource that Kubernetes will act upon. This has been intelligently inferred by kubectl from the YAML file.
Both commands may have similar results. However, a YAML file can contain more than one resource. For example, you can specify a deployment, a secret, and a volume in the same YAML file. Calling kubectl against the file will automatically create or terminate all the objects contained in it.
As a bonus, let’s demonstrate how a pod can host more than one container. Let’s create another YAML file. Call it pods02.yaml and add the following:
apiVersion: v1
kind: Pod
metadata:
name: webserver
spec:
containers:
- name: webserver
image: nginx:latest
ports:
- containerPort: 80
- name: webwatcher
image: afakharany/watcher:latest
The only change we made here is adding a new container to the containers array. The name of the container now is more important than before; as it will be used whenever you need to execute commands against this specific container in the pod.
Again, let’s apply the configuration by issuing kubectl apply -f pods02.yaml
then kubectl get pods
. You should see an output close to the following:
NAME READY STATUS RESTARTS AGE
webserver 2/2 Running 0 8s
Notice that the READY column now has two containers running out of two.
The second container features a parser that will count the number of ‘n’ characters in the HTML returned from the web-server’s home page. In a real-world case, this may be processing JSON output from an API, but let’s keep things simple. The Dockerfile used to create this second container is as follows:
FROM python:3.6
RUN pip install requests
COPY watcher.py /
ENTRYPOINT ["python", "/watcher.py"]
And the watcher.py file contents are:
import requests
import time
r = requests.get("http://127.0.0.1").text
while True:
num = r.count("n")
print("There are " + str(num) + " occurrences of 'n'")
time.sleep(5)
You don’t need any Python knowledge here. Just notice that the GET request (line 3) is directed to 127.0.0.1. This is the loopback interface, but it is shared by the web server and the watcher containers. Communicating with localhost provides as low network latency as possible.
Since we have two containers in a pod, we will need to use the -c option with kubectl when we need to address a specific container. For example:
kubectl exec -it webserver -c webwatcher -- /bin/bash
Self-service developer platform is all about creating a frictionless development process, boosting developer velocity, and increasing developer autonomy. Learn more about self-service platforms and why it’s important.
Explore how you can get started with GitOps using Weave GitOps products: Weave GitOps Core and Weave GitOps Enterprise. Read more.
More and more businesses are adopting GitOps. Learn about the 5 reasons why GitOps is important for businesses.
Implement the proper governance and operational excellence in your Kubernetes clusters.
Comments and Responses