14-days FREE Trial

 

Right-size Kubernetes cluster, boost app performance and lower cloud infrastructure cost in 5 minutes or less

 

GET STARTED

  Blog

Running Containers Securely Under Kubernetes

 

When you decide to use containers for running your software, you save money, effort, and can make use of orchestration systems like Kubernetes, and that even increases your business efficiency. However, switching to containers also means following a different mindset when dealing with issues like security. Many people are fooled into thinking that containers are just lighter forms of virtual machines. While containers can give you the illusion that they are working on their own, isolated from the host’s operating system and from other containers, the truth is, they aren’t.

A container at the end is just a process running on your operating system. It is just using powerful features of the Linux kernel to provide a level of isolation.

The above introduction is necessary so that you can understand the recommendations that we’ll lay out in this article. When you think of a container as a process rather than a full-blown machine, the coming security recommendations will make a lot of sense.

Recommendation 01: Never Run The Container As Root

If you examine the well-known daemons that run on Linux, you’ll find out that most of them don’t use the root account as the process owner. Take Apache for example; although you do need to use the root account to start http (or apache2 if you’re running on Ubuntu), the daemon itself spawns child processes that are owned by a non-privileged user that is typically created for this purpose. So, on Ubuntu, if you are running Apache, you’ll multiply child processes owned by a user called www-data. It is those processes that receive and respond to requests, and they are the ones that listen on port 80. The following is a real-world example from a production machine:

ubuntu:~/ $ ps -ef | grep apache
root       1526      1  0 Sep21 ?        00:02:50 /usr/sbin/apache2 -k start
www-data  89643   1526  0 17:11 ?        00:00:00 /usr/sbin/apache2 -k start
www-data  89656   1526  0 17:11 ?        00:00:00 /usr/sbin/apache2 -k start
www-data  90837   1526  0 17:33 ?        00:00:00 /usr/sbin/apache2 -k start
www-data  93493   1526  0 18:18 ?        00:00:00 /usr/sbin/apache2 -k start
www-data  95117   1526  0 18:47 ?        00:00:00 /usr/sbin/apache2 -k start
www-data  95960   1526  0 19:02 ?        00:00:00 /usr/sbin/apache2 -k start
www-data  95961   1526  0 19:02 ?        00:00:00 /usr/sbin/apache2 -k start
www-data  96675   1526  0 19:14 ?        00:00:00 /usr/sbin/apache2 -k start
www-data  96902   1526  0 19:18 ?        00:00:00 /usr/sbin/apache2 -k start
www-data  97487   1526  0 19:28 ?        00:00:00 /usr/sbin/apache2 -k start

So, we agreed that Linux containers are nothing more than processes running on their own isolated namespaces. Hence, they also need to run as a non-privileged user unless strictly required.

Now, let’s have a practical example; a container that runs a simple shell command:

FROM alpine:latest
RUN apk update && addgroup -S mygroup && adduser -S myuser -G mygroup
USER myuser
ENTRYPOINT ["sh","-c","sleep 100000"]

Note that in this Dockerfile, we are creating a normal, non-privileged user myuser and using it to start the container application. Let’s build and run this image:

$ docker build --no-cache -t secured .
Sending build context to Docker daemon  2.048kB
Step 1/4 : FROM alpine:latest
 ---> 965ea09ff2eb
Step 2/4 : RUN apk update && addgroup -S mygroup && adduser -S myuser -G mygroup
 ---> Running in 85305e2feaec
fetch http://dl-cdn.alpinelinux.org/alpine/v3.10/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.10/community/x86_64/APKINDEX.tar.gz
v3.10.3-36-g7cacb7930a [http://dl-cdn.alpinelinux.org/alpine/v3.10/main]
v3.10.3-37-gc81e885c62 [http://dl-cdn.alpinelinux.org/alpine/v3.10/community]
OK: 10339 distinct packages available
Removing intermediate container 85305e2feaec
 ---> 5c18a66c1445
Step 3/4 : USER myuser
 ---> Running in cf1525e4e653
Removing intermediate container cf1525e4e653
 ---> 7d5095f0fd6b
Step 4/4 : ENTRYPOINT ["sh","-c","sleep 100000"]
 ---> Running in 0331864a7ead
Removing intermediate container 0331864a7ead
 ---> 794b14aa26fb
Successfully built 794b14aa26fb
Successfully tagged secured:latest
$ docker run -d secured337e45006106fc823e69ba9bc1459ca81e4ac51cbbe9616cdcec7421b5198fbb
$ docker container ls                                                                                                                          
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS               NAMES
337e45006106        secured             "sh -c 'sleep 100000'"   9 seconds ago       Up 7 seconds                            lucid_williams
$ docker exec -it 337e45006106 sh                                                                                                              
/ $ id
uid=100(myuser) gid=101(mygroup) groups=101(mygroup)
/ $

Note that when we logged into the container by running the docker exec command, we are dropped into a shell that does not use the root user. Any command that we execute (or the attacker executes) are constrained.

Recommendation 02: Image Pull Policy

When running a container through a Kubernetes Pod, the image - by default - gets pulled only the first time it is used. Subsequent requests to the same image are fulfilled by using the cached image on the node. While this may seem like a convenient way to save time (and bandwidth) whenever you recreate the Pod, it carries a serious security risk if this node is used to host other Pods.

Assume that you are using an image called middleware_client1.5. The image contains the authentication logic necessary for accessing the middleware part of your application. To protect your image, you place it in a private registry that requires credentials to pull images from it. The first time your middleware pods run, they pull the image from the registry using the credentials that you specified. However, this means that any other pods that get scheduled on the same node are able to use the middleware_client1.5 image without having to authenticate themselves to the image registry. This can be illustrated in the following diagram:

 

Running Securely Under Kubernetes.

 

Learn how to continuously optimize your k8s cluster

To address this risk, you should use the imagePullPolicy: Always parameter in the Pod definition or in the pod template of higher controllers like Deployments, StatefulSets, etc.

An example of a pod that addresses this risk may look as follows:

apiVersion: v1
kind: Pod
metadata:
  name: middlewareclient
spec:
  containers:
    - name: uses-middleware-image
      image: middlwareclient:1.5
      imagePullPolicy: Always
      command: [ "echo", "SUCCESS" ]

Recommendation 03: Control Traffic To And From The Pods

Once you are running your containers inside Kubernetes Pod, you should start thinking about how your pods will communicate with other entities on the network. Kubernetes offers Network Policies to control and restrict how pods handle incoming and outgoing communication. However, before using this feature you must ensure that your network addon supports it. As you probably already know, Kubernetes do ship with a network controller. Instead, you can choose among a number of providers that offer network addons. Most of the well-known providers support network policies, namely Calico, Weave Net, and Cilium. Therefore, what is a network policy good for? How is it implemented? Let’s have a quick example.

Consider the definition:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: jailed
spec:
  podSelector: {}
  policyTypes:
    - Ingress
    - Egress

As the name suggests (jailed) this is a NetworkPolicy resource that locks down pod connections. Any pod matched by this policy is denied access to both incoming and outgoing traffic. It’s worth noting that this specified NetworkPolicy does not target any particular pod as denoted by the empty braces {}. When the NetworkPolicy does not select any pod, all pods are matched. In other words, any pods created in the default namespace will not have incoming or outgoing network access. Note that different network policies can work together at the same time. The more specific policy is matched first. Hence, we can use a combination of the lockdown network policy that we defined earlier with policies that are more specific about which pods they control to fine-grain our network access control mechanisms. Consider the following definition:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-app-mysql
spec:
  podSelector:
    matchLabels:
      role: db
  ingress:
    - from:
      - podSelector:
          matchLabels:
            role: app
      ports:
      - protocol: TCP
        port: 3306

Let’s have a closer look at this definition:

Lines 6-8: we specify that this policy will be applied to pods that have the label role=db.

The rest of the definition instructs the policy about which type of traffic we are interested in controlling (ingress). We need to allow connections coming from other pods that have the label role=app. Additionally, those application pods can access the database pods only on the DB port 3306 (the default port for MySQL).

So, this explicitly allows traffic from the application pods to the database pods on port 3306. But what about the rest of the communication? Since no other network policies match the app=db labeled pods, the catch-all policy is applied, which is the lockdown we created earlier. In other words, all other incoming connections shall be blocked.

However, the above means that our clients will not be able to access our application since no policies match the role=app pods. Let’s create a third policy that matches our application pods:

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: external-app
spec:
  podSelector:
    matchLabels:
      role: app
  ingress:
    - from: []

Now, we have our hypothetical application secured. If an attacker could gain access to our application pod, he/she could only gain access to the DB on the 3306 port. They cannot reach other pods, which may decrease and control the severity of the damage they can do.

Network policies can also be used to restrict access to the metadata API of the cloud provider. If you don’t know what the Metadata API is, it’s simply a way through which cloud providers offer information about the resources that you are using. For example, if you’re using an EC2 instance on AWS, issuing a GET request to http://169.254.169.254 to receive a lot of information about the instance that you are running in, like the DNS name, the external IP address, the IAM role that the instance is using (if any) and so on. Obviously, you don’t want data of such criticality to be available to the pod unless it’s strictly required. Hence, you can make use of NetworkPolicies to restrict access to this IP address unless the pod needs it.

Share Your Cloud-Native Experience With Thousands Of Engineers And Earn Money!

Write For Cloud-Native on the Magalix Blog

TL;DR

  • A crucial part of your Kubernetes security matrix goes to ensuring that your pods run in a controlled and secure manner.
  • In this article, we presented three recommendations for securing your Pods in a Kubernetes cluster. The first one dealt with running the containers using another user than root. Most base images like alpine, ubuntu, etc. are baked so that the default user is root. This is useful because most of the time you execute privileged tasks, like installing packages before moving to the ENTRYPOINT part. Once you no longer need root, use the USER parameter to set the current user to a different one.
  • The next recommendation dealt with ensuring that the image always gets pulled when the pod is recreated. This ensures that no pod will have access to cached images unless they pass through the due authentication process to the image registry.
  • The last recommendation we discussed was the NetworkPolicy resource. NetworkPolicies is a large topic on its own. We briefly explained how it can be used to ensure that your pods will only receive traffic from the sources that are supposed to send that traffic. For example, ensuring that the database pods receive traffic from the application pod only.

 

Mohamed Ahmed

Jan 28, 2020