The core function of Kubernetes is resources scheduling. Kubernetes scheduler makes sure that containers get enough resources to execute properly. This process is governed by the scheduling policies. Before digging into how the scheduler works, let’s make sure we understand the basic constructs of resources definitions, allocation, and restrictions inside the Kubernetes cluster.
Kubernetes has only two built-in manageable resources: CPU and memory. CPU base units are cores, and memory is specified in bytes. These two resources play a critical role in how the scheduler allocate pods to nodes. Memory and CPU can be requested, allocated, and consumed. You should always set the right CPU memory values. You will be in control of your cluster and make sure that a misbehaving application does not impact capacity available for other pods in your cluster.
Requests & Limits
Kubernetes uses the requests & limits structure to control resources such as CPU and memory.
- Requests are what the container is guaranteed to get. For example, If a container requests a resource, Kubernetes will only schedule it on a node that can give it that resource.
- Limits, on the other hand, is the resource threshold a container never exceed. The container is only allowed to go up to the limit, and then it is restricted.
CPU is a compressible resource, which means that once your container reaches the limit, it will keep running but the operating system will throttle it and keep de-scheduling from using the CPU. Memory, on the other hand, is none compressible resource. Once your container reaches the memory limit, it will be terminated, aka OOM (Out of Memory) killed. If your container keeps OOM killed, Kubernetes will report that it is in a crash loop.
The limit can never be lower than the request. Kubernetes will throw an error and won’t let you run the container if your limit is higher than the request.
TIP: Set requests and limits at the container level. It is a good practice to set it at the container level for more control and a more efficient distribution of containers. If your main container consumes gigabytes of memory and a sidecar container that need a few megabytes, setting the request and limits at the pod level will give them the same amount of memory. Use Magalix Autopilot to automatically set and update the values of limits and requests of your containers.
Defining CPU Requests & Limits
To specify a CPU request for a Container, include the resources:requests field in the Container’s resource manifest. Similarly, to specify a CPU limit, include resources:limits.
CPU Units at different cloud providers
CPU unit inside Kubernetes is originally equivalent to one hyperthread if you are running on bare-metal Intel process with Hyperthreading. It is important to understand how this mapped to the different CPU capacities of major cloud providers out there. Misinterpreting these may cause bad performance results inside your Kubernetes cluster. The table below is a quick mapping of different cloud providers CPU units.
|Infrastructure||1 CPU Equivalent|
|bare-metal Intel processor with Hyperthreading||1 Hyperthread|
Define Memory Requests & Limits
To specify a memory request for a Container, include the resources requests field in the Container’s resource manifest. To specify a memory limit, include resources:limits. The Container has a memory request of 100 MiB and a memory limit of 200 MiB. If you want to learn more about Memory requests & limits see this reference from the Kubernetes documentation.
Meaning of Memory
Limits and requests for memory are measured in bytes. You can express memory as a plain integer or as a fixed-point integer using one of these suffixes: E, P, T, G, M, K.
Take a look at the below example. It is a Pod with two containers. Each Container has a request of 0.5 CPU and 300MiB of memory. Each Container has a limit of 1 CPU and 500MiB of memory.
Ideally, you want your team members to always set limits and resources. But in the real world, your team will forget to so :) Engineers can easily forget to set the resources, or a someone can just get to the old habits of over-provisioning to be at the safe side.
You can prevent these scenarios. You can set up ResourceQuotas and LimitRanges at the Namespace level.
You can lock namespaces using ResourceQuotas. ResourceQuotas let’s you just look at how you can restrict CPU and Memory resource usage for containers inside that namespace. A Quota for resources might look something like this:
You can see there are four sections. Configuring each of these sections is optional.
- requests.cpu is the maximum combined CPU requests in millicores for all the containers in the Namespace. You can have as many containers as you want as long as the total requested CPU in the Namespace is less than 500m.
- requests.memory is the maximum combined Memory requests for all the containers in the Namespace. You can have as many containers as you want to use this quota as long as the total requested Memory in the Namespace is less than 100MiB.
- limits.cpu is the maximum combined CPU limits for all the containers in the Namespace.
- limits.memory is the maximum combined Memory limits for all containers in the Namespace.
TIP: If you are using a production and development Namespace, it is recommended to avoid defining quota on the production Namespace and define strict quotas on the development Namespace. You don’t want your production containers to be throttled or evicted because the dev env need more resources!
the LimitRange applies to an individual container. This can help prevent your team members from creating super tiny or super large containers inside the Namespace. A LimitRange might look like this:
You can see there are four optional sections.
- The default section sets up the default limits for a container in a pod. containers that don’t explicitly set these themselves will get assigned the default values will get assigned these values by default.
- The defaultRequest section sets up the default requests for a container in a pod. containers that don’t explicitly set these themselves will get assigned the default values will get assigned these values by default.
- The max section will set up the maximum limits that a container in a Pod can set. The default section cannot be higher than this value. Also, limits set on a container cannot be higher than this value. Containers that don’t explicitly set these values themselves will get assigned the max values as the limit if this value is set and the default section is not.
- The min section sets up the minimum Requests that a container in a Pod can set. The defaultRequest section cannot be lower than this value. Also, requests set on a container cannot be lower than this value either. The min value becomes the defaultRequest value too if this value is set and the defaultRequest section is not,
The Lifecycle of a Kubernetes Pod
It is important to understand how pods get granted resources and what happens when they exceed what is allocated or the overall capacity of your cluster so you can tune your containers and cluster capacity correctly. Below is the typical pod scheduling workflow:
- You allocate resources to your pod through the spec and run kubectl apply command
- Kubernetes scheduler will use round-robin scheduling policy to pick a Node to run your pod
- For each node Kubernetes checks if the node has enough resources to fulfill the resources requests.
- The pod is scheduled for the first node has enough resources.
- The pod goes into the pending state if none of the available nodes has enough resources.
- If you are using CA autoscaler, your cluster will scale the number of nodes to allocate more capacity.
- If a node is running pods where the sum of their limits more than its available capacity, Kubernetes goes into the Overcommitted State.
- Because CPU can be compressed, Kubernetes will make sure your containers get the CPU they requested and will throttle the rest.
- Memory cannot be compressed, so Kubernetes needs to start making decisions on what containers to terminate if the Node runs out of memory.
What if Kubernetes Goes into Overcommitted State?
Keep in mind that Kuberentes optimizes for the whole system’s health and availability. When it goes into the overcommitted state, Kubernetes scheduler may make decisions to kill pods. Generally, if a pod is using resources more than requested, that pod becomes a candidate for termination. Here you go the conditions that you should keep in mind:
- A pod does not have a request of CPU and memory defined. Kubernetes considers this pod using more than they requested by default.
- A pod has a request defined but they are using more than the requested, but under their limit.
- If multiple pods gone over their requests, Kubernetes will rank they based on their priority. Lowest priority pods are terminated first.
- If all pods are with the same priority, Kubernetes starts with the pod that is most going over its requested resources.
- Kubernetes may kill pods within their request if a critical component, such as kubelet or docker engine, are using more resources than what’s reserved for them. This are rare cases.
- Setting the request of CPU or memory more than any of cluster nodes handle will make the container in the pending state indefinitely till there is a large enough node.
- Setting up the limit higher than a single node ability to handle puts your node(s) under the risk of being inaccessible due to CPU overload. Such a high limit allows a container or more to monopolize all the resources in the node. Other nodes will starve and you may not even be able to ssh into the node to kill those containers.
- Setting the CPU limit to very small value may result of heavy throttling of your container. You want to watch CPU throttling metrics, specifically pu.cfs_period_us and the cpu.cfs_quota_us
- You should always review the resources you allocate to your containers. They change with your workloads and the update you make to your code base. Read our experience dynamically managing resources. Use Magalix Autopilot to eliminate that overhead, and get detailed performance and capacity analysis.