<img src="https://ws.zoominfo.com/pixel/JHVDdRXH2uangmUMQBZd" width="1" height="1" style="display: none;">

The Importance of Using Labels in Your Kubernetes Specs: A Guide

DevOps Kubernetes write-for-cloud-native opa Labels
The Importance of Using Labels in Your Kubernetes Specs: A Guide
DevOps Kubernetes write-for-cloud-native opa Labels

Even a small Kubernetes cluster may have hundreds of Containers, Pods, Services and many other Kubernetes API objects.

It quickly becomes annoying to page through pages of kubectl output to find YOUR object -labels address this issue perfectly.

The primary reasons you should use labels are:

  • enables you to logically organize all your Kubernetes workloads in all your clusters,
  • enables you to very selectively filter kubectl outputs to just the objects you need.
  • enables you to understand the layers and hierarchies of all your API objects.
  • Open Policy Agent (OPA) can be a more flexible admission controller by using the rich set of information labels provide.

In the rest of this article, we'll elaborate on these benefits.

Labels versus Annotations

Labels and annotations are sometimes confused. Having a quick look at the documentation makes this understandable.

Labels:

"metadata": {
  "labels": {
    "key1" : "value1",
    "key2" : "value2"
  }
}

Annotations:


"metadata": {
  "annotations": {
    "key1" : "value1",
    "key2" : "value2"
  }
}

Fortunately it is easy to consult the documentation to see the difference:

"Labels are key/value pairs that are attached to objects, such as pods. Labels are intended to be used to specify identifying attributes of objects that are meaningful and relevant to users, but do not directly imply semantics to the core system. Labels can be used to organize and to select subsets of objects."

You can use Kubernetes annotations to attach arbitrary non-identifying metadata to objects. Clients such as tools and libraries can retrieve this metadata.

You can use either labels or annotations to attach metadata to Kubernetes objects. Labels can be used to select objects and to find collections of objects that satisfy certain conditions. In contrast, annotations are not used to identify and select objects. The metadata in an annotation can be small or large, structured or unstructured.

 
Example labels:

"release" : "stable"
"release" : "canary"

"environment" : "dev"
"environment" : "qa"
"environment" : "production"

Example annotations:

standbyphone: 000-000 0000
developer: Neil Armstrong

Let's now focus more deeply on labels and how to use them.

Well-Known Labels, Annotations and Taints

Before we create our own labels let us look at some labels that Kubernetes creates automatically.

Kubernetes automatically creates these labels on nodes:

kubernetes.io/arch
Example: 
kubernetes.io/arch=amd64

kubernetes.io/os
Example: 
kubernetes.io/os=linux

node.kubernetes.io/instance-type 
Example: 
node.kubernetes.io/instance-type=m3.medium


topology.kubernetes.io/zone

Example 1: 
topology.kubernetes.io/region=us-east-1

Example 2: 
topology.kubernetes.io/zone=us-east-1c

See the full list here.

These labels now allow us to filter our nodes in the following interesting ways:

1. List All Linux Nodes:

kubectl get nodes -l 'kubernetes.io/os=linux'

2. List all nodes with instance type m3.medium:

kubectl get nodes -l 'node.kubernetes.io/instance-type=m3.medium'

3. List all nodes in a specific region:

kubectl get nodes -l 'topology.kubernetes.io/region=us-east-1'

4. List all nodes in specific regions:

kubectl get nodes -l 'topology.kubernetes.io/region in (us-east-1, us-west-1)'

If we apply these labels on all our Pods we may filter the kubectl output as follows:

"release" : "stable"
"release" : "canary"

"environment" : "dev"
"environment" : "qa"
"environment" : "production"
kubectl get pods -l 'environment in (production), release in (canary)'

kubectl get pods -l 'environment in (production, qa)'

kubectl get pods -l 'environment notin (qa)'

Considering a given complex environment of multiple Kubernetes clusters, multiple nodes and many more namespaces, it's easy to see the ability to filter kubectl output is a major timesaver.

In addition, Job, Deployment, ReplicaSet, and DaemonSet, support set-based selectors as well.

selector:
  matchLabels:
    component: redis
  matchExpressions:
    - {key: tier, operator: In, values: [cache]}
    - {key: environment, operator: NotIn, values: [dev]}

For more information on this powerful selector syntax here.

Organize All your Kubernetes Workloads in All your Clusters

First, a quick refresher on Amazon AWS terminology:

  1. Region: A physical location around the world.
  2. Availability Zone: A group of data centers inside a region.

This means that your containers have the following hierarchy:

Region -> Availability Zone -> Kubernetes cluster -> Namespace -> Deployment -> Pod -> Containers

This is a partial list: you could also have organisations and departments, teams, Azure network security groups, and so on.

You can use labels to add labels at every level in this hierarchy.

This enables you to understand the full global scope of all the layers and hierarchies of all your API objects.

When you combine this with the label selectors, you have an infinite number of ways to filter your Kubernetes workloads.

Example: Find Pods by Labels to Get their Pod Logs

Given a namespace "your-namespace" and a label query that identifies the pods you are interested in, you can get the logs for all of those pods. If the pod isn't unique, it will fetch the logs for each pod in parallel.

 ns='qa' ; label='release=canary' ; kubectl get pods -n $ns -l $label -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}' | xargs -I {} kubectl -n $ns logs {}

ns = your-namespace 

kubectl get pods -n $ns -l $label -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}'

This command gets a list of Pod names in the $ns namespace with label of release=canary. It outputs the Pod names.

| xargs -I {} kubectl -n $ns logs {}

This part of the command receives the list of Pod names and shows their logs.

xargs is the domain of Linux administrator shell experts.

The point of this example is that you can very selectively via Linux batch scripting process lists of API objects. This gets more useful the more clusters, namespaces and API objects you have.

Kubernetes Recommended Labels

The official Kubernetes documentation recommends that you use the following labels:

  • name: name of application
  • instance: unique name of instance
  • version: semantic version number
  • component: the component within your logical architecture
  • part-of: the name of the higher level application this object is part of
  • managed-by: helm for example

Example from the documentation:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app.kubernetes.io/name: mysql
    app.kubernetes.io/instance: mysql-abcxzy
    app.kubernetes.io/version: "5.7.21"
    app.kubernetes.io/component: database
    app.kubernetes.io/part-of: wordpress
    app.kubernetes.io/managed-by: helm

You should define such a list of labels that all API objects at your company must use.

For example, a Wordpress application may use:

  • PersistentVolume.
  • PersistentVolumeClaim.
  • Deployment.
  • Pods.
  • Containers.
  • Service.
  • Ingress.

You relate all the above API objects via app.kubernetes.io/part-of: wordpress

When you do this, the one command below can list all those objects in one go.

kubectl get all -l 'app.kubernetes.io/part-of=wordpress'

Labels Syntax

You may filter on labels in an equality-based manner:

environment = production
tier != frontend

You may also filter on labels in a set-based manner:

environment in (production, qa)
tier notin (frontend, backend)
us-west-1 
!us-west-1

For more information about the syntax of labels here


 

Already working in production with Kubernetes? Learn how to enforce best practice policies and governance using OPA

👇👇

Learn More


Some Real Life Examples and Use-Cases to Learn More

In one of our past articles, the word label occurs 34 times: Influencing Kubernetes Scheduler Decisions article.

This article describes how the scheduler selects the best node to host the Pod and how we can influence its decision using labels.

In another one of our previous articles, we described how to enforce K8s container images without the label "latest". You can also use this article to enforce your API objects to have certain labels or enforce / prevent the use of certain labels.

Enforce that all Kubernetes container images must have a label that is not “latest” using OPA 1

Example 1: Enforce Having A Specific Label In Any New Namespace from this article is a perfect example of how to use OPA to enforce the use of labels.

Example 2: Guestbook application on Kubernetes

The following YAML files are edited for clarity to show mostly the important label fields.

apiVersion: v1
kind: Service
metadata:
  name: redis-master
  labels:
    app: redis
    tier: backend
    role: master
spec:
  selector:
    app: redis
    tier: backend
    role: master

apiVersion: apps/v1 
kind: Deployment
metadata:
  name: redis-master
spec:
  selector:
    matchLabels:
      app: redis
      tier: backend
      role: master

  template:
    metadata:
      labels:
        app: redis
        tier: backend
        role: master

    spec:
      containers:
      - name: master
        image: k8s.gcr.io/redis:e2e  # or just image: redis

This Service and Deployment above are logically grouped together by having the same three identical labels. Notice the role: master

 apiVersion: v1
kind: Service
metadata:
  name: redis-slave
  labels:
    app: redis
    tier: backend
    role: slave
spec:
  selector:
    app: redis
    tier: backend
    role: slave

apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis-slave
spec:
  selector:
    matchLabels:
      app: redis
      tier: backend
      role: slave
  replicas: 2
  template:
    metadata:
      labels:
        app: redis
        tier: backend
        role: slave
    spec:
      containers:
      - name: slave
        image: gcr.io/google_samples/gb-redisslave:v1

This second set of Service and Deployment above are logically grouped together by having the same three identical labels. Notice the role: slave.

Note how all eight the API objects above have the same app: redis label. This links them all together as one logical unit.

apiVersion: v1
kind: Service
metadata:
  name: frontend
  labels:
    app: guestbook
    tier: frontend
spec:
  type: NodePort 
  selector:
    app: guestbook
    tier: frontend

apiVersion: apps/v1
kind: Deployment
metadata:
  name: frontend
spec:
  selector:
    matchLabels:
      app: guestbook
      tier: frontend
  replicas: 3
  template:
    metadata:
      labels:
        app: guestbook
        tier: frontend
    spec:
      containers:
      - name: php-redis
        image: gcr.io/google-samples/gb-frontend:v4
        resources:

This third set of Service and Deployment above are logically grouped together by having the same two identical labels:

app: guestbook and tier: frontend

API Object Name App tier role
service redis-master redis backend master
deployment redis-master redis backend master
service redis-slave redis backend slave
deployment redis-slave redis backend slave
service frontend guestbook frontend -
deployment frontend guestbook frontend -

This enables great flexibility in selecting on labels just the masters, slaves, frontend or just the backend components.

This is neater and more flexible than having long names such as:

redis-master-service-backend
redis-master-deploy-backend

redis-slave-service-backend
redis-slave-deploy-backend

guestbook-service-frontend
guestbook-deploy-frontend

Using these long names you cannot easily select these objects by category since it is encoded in the object names. ( You need unique and complex regex expression for each category you want to select on. )

Open Policy Agent (OPA) should enforce that there are network policies on the frontend and backend labels. The backend objects must not allow Internet access.

Frontend containers should have stricter security policies compared to the internal access backend deployments ( also enforced by OPA ). ( Frontend is exposed to the world, backend not. )

Summary

This article answered the question "Why is it Important to Use Labels in Your Workloads’ Specs?"

You have seen the power of labels:

  • To logically organize
  • To enable selective filters
  • To understand API object hierarchies
  • To enable Open Policy Agent to use labels as a data field to selectively apply policies.

As you have seen above, the organization and filter functionalities of Kubernetes labels are awesome.

Using labels will cause your logs to contain more contextual information, monitoring tools will enable selecting specific workloads you are interested in, bash scripting has a richer set of information to work from.

You can use the Open Policy Agent (OPA) as an admission controller. OPA can allow or deny API object creation based on the value of its labels. More on this next time.


Already working in production with Kubernetes? Want to know more about kubernetes application patterns?

👇👇

Download Kubernetes Application Patterns E-Book


 

Comments and Responses

Related Articles

Team Productivity: Resource Management

Since the introduction of containers, the method of building and running applications in an organization has

Read more
Capacity Management for Teams on Kubernetes: Setting Pod CPU and Memory Limits

Capacity management is a complex, ever-moving target, for teams on any infrastructure, whether on-prem,

Read more
Kubernetes cost saving K8s
Kubernetes Cost Optimization with Magalix

Is our Spending Getting Worse? I woke up one day to see this email from our CEO in my mailbox. I knew this

Read more

start your 14-day free trial today!

Automate your Kubernetes cluster optimization in minutes.

Get started View Pricing
No Card Required