<img src="https://ws.zoominfo.com/pixel/JHVDdRXH2uangmUMQBZd" width="1" height="1" style="display: none;">

Understanding Kubernetes Memory Metrics

DevOps Kubernetes write-for-cloud-native Memory Metrics
Understanding Kubernetes Memory Metrics
DevOps Kubernetes write-for-cloud-native Memory Metrics


Kubernetes collects lots of metrics data regarding the resources usage within the cluster (cpu, memory, network, disk). In this article, we will be focusing on the various memory metrics that are collected by cAdvsior and which ones cause OOMkill whenever we apply memory limits to the pods.

Understanding Kubernetes Memory Metrics

Kubernetes Metrics For Memory

When you execute inside your container and navigate to the following directory, you will find all the container’s memory information you may need. The usage, limits, cache, number of times your container got OOMKilled, e.t.c will all be found in this directory:

cd /sys/fs/cgroups/memory/

The following control files will be gotten whenever the content of this directory is listed out:

root@test-8589C7679b-bfjrn:/sys/fs/cgroup/memory# ls
cgroup.clone_children	memory.kmem.limit_in_bytes	memory.kmem.tcp.Usage_in_bytes memory.oom_control 	memory.use_hierarchy  
cgroup.event_control   memory.kmem.max_usage_in_bytes memory.kmem.usage_in_bytes    	memory.pressure_level  notify_on_release
cgroup.procs   memory.kmem.slabinfo   memorylimit_in_bytes   memory.soft_limit_in_bytes 	tasks
memory.failcnt     	memory.kmem.tcp.failcnt    	memory.max_usage_in_bytes    	memory.stat	
memory.force_empty 	memory.kmem.tcp.limit_in_bytes memory.move_charge_at_immigrate memory.swappiness  	
memory.kmen.failcnt	memory.kmem.tcp.max_usage_in_bytes	memory.numa_stat    	memory.usage_in_bytes

Each file contains a piece of information about the memory. You can read more about them here.

Our interest in memory.status file:

root@test:/sys/fs/cgroup/memory# cat memory.stat
cache 32575488
rss 33964032
rss_huge 0
shmem 1757184
mapped_file 16625664
dirty 0
writeback 0
pgpgin 19272
pgpgout 2871
pgfault 16137
pgmajfault 66
inactive_anon 1757184
active_anon 33927168
inactive_file 27709440
active_file 2433024
unevictable 0
hierarchical_memory_limit 4089446400
total_cache 32575488
total_rss 33964032
total_rss_huge 0
total_shmem 1757184
total_mapped_file 16625664
total_dirty 0
total_writeback 0
total_pgpgin 19272
total_pgpgout 2871
total_pgfault 16137
total_pgmajfault 66
total_inactive_anon 1757184
total_active_anon 33927168
total_inactive_file 27709440
total_active_file 2433024
total_unevictable 0

The cAdvisor gathers those numbers in the file and uses them to calculate memory metrics which are collected by Prometheus.

What Differentiates Them From Each Other?

In this section, we will be discussing the difference between “container_memory_working_set_bytes” and “container_memory_rss”:

1. Container_memory_rss:

From the cAdvisor code, the container_memory_rss is known as:

The total amount of anonymous and swap cache memory (it includes transparent hugepages), and it equals to the value of total_rss from memory.status file, you can easily access the code here:

ret.Memory.RSS = s.MemoryStats.Stats["total_rss"]

Please take note, do not get confused and take RSS with ‘resident set size’. To know more about them, check kernel documentation:

Note: Only anonymous and swap cache memory is listed as part of the ‘rss’ stat.

This should not be confused with the true ‘resident set size’ or the amount of physical memory used by the cgroup. ‘rss + file_mapped” will give you the resident set size of cgroup.

Note: file and shmem may be shared among other cgroups. In that case, file_mapped is accounted for only when the memory cgroup is the owner of the cache page.

2. Container_memory_working_set_bytes:

From the cAdvisor code, the working set memory is defined as: The amount of working set memory and it includes recently accessed memory,dirty memory, and kernel memory. Therefore, Working set is (lesser than or equal to) </= "usage".

This can be gotten by deducting total_inactive_file from usage, the code can be accessed here:

inactiveFileKeyName := "total_inactive_file"
if cgroups.IsCgroup2UnifiedMode() {
 inactiveFileKeyName = "inactive_file"
workingSet := ret.Memory.Usage
if v, ok := s.MemoryStats.Stats[inactiveFileKeyName]; ok {
 if workingSet < v {
  workingSet = 0
 } else {
  workingSet -= v
ret.Memory.WorkingSet = workingSet

Note: The command for the value from kubectl top pods is gotten from container_memory_working_set_bytesmetric.

Which Should You Monitor:

The Memory_working_set Or the Memory_rss?

Now that we know how the cAdvisor gets ‘RSS’ and ‘workin_set_bytes’ values, which one is advisable to be monitored?

  • It is worth mentioning that if you are using resource limits on your pods, then the best thing to do is to monitor both of them in order to prevent your pods from being oom-killed.
  • If the ‘container_memory_rss’ is increased to the limit, then it will lead to oomkill.
  • Likewise if the ‘container_memory_working_set_bytes’ is also increased to the limit, it will definitely lead to oomkill.

How Can That Be Monitored? Magalix Holds The Solution To That:

Magalix will prevent your pods being oomkilled due to lack of memory by taking both metrics from your cluster, and increase your memory limits(capacity) should they tend to become full.

Understanding Kubernetes Memory Metrics

Already working in production with Kubernetes? Get insights and recommendations on how to improve your resource usage, security, and app performance.


Learn More


Comments and Responses

Related Articles

Product In-Depth: Enforce Policies and Standards from a Single Console

Magalix provides a single management interface to control, enforce and visualize the state of compliance for all of your clusters.

Read more
Product In-Depth: Centralized Policy Management

achieving DevSecOps isn’t as difficult as you may have been led to believe. Interested in learning more about how to start resolving violations in minutes

Read more
Product In Depth: Detailed Violation Analysis

Security, compliance, and governance are not just one-time events that happen every so often. Managing a compliant environment is a 24x7 operation.

Read more