Even a small Kubernetes cluster may have hundreds of Containers, Pods, Services and many other Kubernetes API objects.
It quickly becomes annoying to page through pages of kubectl output to find YOUR object -labels address this issue perfectly.
The primary reasons you should use labels are:
- enables you to logically organize all your Kubernetes workloads in all your clusters,
- enables you to very selectively filter kubectl outputs to just the objects you need.
- enables you to understand the layers and hierarchies of all your API objects.
- Open Policy Agent (OPA) can be a more flexible admission controller by using the rich set of information labels provide.
In the rest of this article, we'll elaborate on these benefits.
Labels versus Annotations
Labels and annotations are sometimes confused. Having a quick look at the documentation makes this understandable.
Labels:
"metadata": {
"labels": {
"key1" : "value1",
"key2" : "value2"
}
}
Annotations:
"metadata": {
"annotations": {
"key1" : "value1",
"key2" : "value2"
}
}
Fortunately it is easy to consult the documentation to see the difference:
"Labels are key/value pairs that are attached to objects, such as pods. Labels are intended to be used to specify identifying attributes of objects that are meaningful and relevant to users, but do not directly imply semantics to the core system. Labels can be used to organize and to select subsets of objects."
You can use Kubernetes annotations to attach arbitrary non-identifying metadata to objects. Clients such as tools and libraries can retrieve this metadata.
You can use either labels or annotations to attach metadata to Kubernetes objects. Labels can be used to select objects and to find collections of objects that satisfy certain conditions. In contrast, annotations are not used to identify and select objects. The metadata in an annotation can be small or large, structured or unstructured.
Example labels:
"release" : "stable"
"release" : "canary"
"environment" : "dev"
"environment" : "qa"
"environment" : "production"
Example annotations:
standbyphone: 000-000 0000
developer: Neil Armstrong
Let's now focus more deeply on labels and how to use them.
Well-Known Labels, Annotations and Taints
Before we create our own labels let us look at some labels that Kubernetes creates automatically.
Kubernetes automatically creates these labels on nodes:
kubernetes.io/arch
Example:
kubernetes.io/arch=amd64
kubernetes.io/os
Example:
kubernetes.io/os=linux
node.kubernetes.io/instance-type
Example:
node.kubernetes.io/instance-type=m3.medium
topology.kubernetes.io/zone
Example 1:
topology.kubernetes.io/region=us-east-1
Example 2:
topology.kubernetes.io/zone=us-east-1c
See the full list here.
These labels now allow us to filter our nodes in the following interesting ways:
1. List All Linux Nodes:
kubectl get nodes -l 'kubernetes.io/os=linux'
2. List all nodes with instance type m3.medium:
kubectl get nodes -l 'node.kubernetes.io/instance-type=m3.medium'
3. List all nodes in a specific region:
kubectl get nodes -l 'topology.kubernetes.io/region=us-east-1'
4. List all nodes in specific regions:
kubectl get nodes -l 'topology.kubernetes.io/region in (us-east-1, us-west-1)'
If we apply these labels on all our Pods we may filter the kubectl output as follows:
"release" : "stable"
"release" : "canary"
"environment" : "dev"
"environment" : "qa"
"environment" : "production"
kubectl get pods -l 'environment in (production), release in (canary)'
kubectl get pods -l 'environment in (production, qa)'
kubectl get pods -l 'environment notin (qa)'
Considering a given complex environment of multiple Kubernetes clusters, multiple nodes and many more namespaces, it's easy to see the ability to filter kubectl output is a major timesaver.
In addition, Job, Deployment, ReplicaSet, and DaemonSet, support set-based selectors as well.
selector:
matchLabels:
component: redis
matchExpressions:
- {key: tier, operator: In, values: [cache]}
- {key: environment, operator: NotIn, values: [dev]}
For more information on this powerful selector syntax here.
Organize All your Kubernetes Workloads in All your Clusters
First, a quick refresher on Amazon AWS terminology:
- Region: A physical location around the world.
- Availability Zone: A group of data centers inside a region.
This means that your containers have the following hierarchy:
Region -> Availability Zone -> Kubernetes cluster -> Namespace -> Deployment -> Pod -> Containers
This is a partial list: you could also have organisations and departments, teams, Azure network security groups, and so on.
You can use labels to add labels at every level in this hierarchy.
This enables you to understand the full global scope of all the layers and hierarchies of all your API objects.
When you combine this with the label selectors, you have an infinite number of ways to filter your Kubernetes workloads.
Example: Find Pods by Labels to Get their Pod Logs
Given a namespace "your-namespace" and a label query that identifies the pods you are interested in, you can get the logs for all of those pods. If the pod isn't unique, it will fetch the logs for each pod in parallel.
ns='qa' ; label='release=canary' ; kubectl get pods -n $ns -l $label -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}' | xargs -I {} kubectl -n $ns logs {}
ns = your-namespace
kubectl get pods -n $ns -l $label -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}'
This command gets a list of Pod names in the $ns namespace with label of release=canary. It outputs the Pod names.
| xargs -I {} kubectl -n $ns logs {}
This part of the command receives the list of Pod names and shows their logs.
xargs
is the domain of Linux administrator shell experts.
The point of this example is that you can very selectively via Linux batch scripting process lists of API objects. This gets more useful the more clusters, namespaces and API objects you have.
Kubernetes Recommended Labels
The official Kubernetes documentation recommends that you use the following labels:
name
: name of applicationinstance
: unique name of instanceversion
: semantic version numbercomponent
: the component within your logical architecturepart-of
: the name of the higher level application this object is part ofmanaged-by
: helm for example
Example from the documentation:
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
app.kubernetes.io/name: mysql
app.kubernetes.io/instance: mysql-abcxzy
app.kubernetes.io/version: "5.7.21"
app.kubernetes.io/component: database
app.kubernetes.io/part-of: wordpress
app.kubernetes.io/managed-by: helm
You should define such a list of labels that all API objects at your company must use.
For example, a Wordpress application may use:
- PersistentVolume.
- PersistentVolumeClaim.
- Deployment.
- Pods.
- Containers.
- Service.
- Ingress.
You relate all the above API objects via app.kubernetes.io/part-of: wordpress
When you do this, the one command below can list all those objects in one go.
kubectl get all -l 'app.kubernetes.io/part-of=wordpress'
Labels Syntax
You may filter on labels in an equality-based manner:
environment = production
tier != frontend
You may also filter on labels in a set-based manner:
environment in (production, qa)
tier notin (frontend, backend)
us-west-1
!us-west-1
For more information about the syntax of labels here
Already working in production with Kubernetes? Learn how to enforce best practice policies and governance using OPA
👇👇
Some Real Life Examples and Use-Cases to Learn More
In one of our past articles, the word label occurs 34 times: Influencing Kubernetes Scheduler Decisions article.
This article describes how the scheduler selects the best node to host the Pod and how we can influence its decision using labels.
In another one of our previous articles, we described how to enforce K8s container images without the label "latest". You can also use this article to enforce your API objects to have certain labels or enforce / prevent the use of certain labels.
Example 1: Enforce Having A Specific Label In Any New Namespace from this article is a perfect example of how to use OPA to enforce the use of labels.
Example 2: Guestbook application on Kubernetes
The following YAML files are edited for clarity to show mostly the important label fields.
apiVersion: v1
kind: Service
metadata:
name: redis-master
labels:
app: redis
tier: backend
role: master
spec:
selector:
app: redis
tier: backend
role: master
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis-master
spec:
selector:
matchLabels:
app: redis
tier: backend
role: master
template:
metadata:
labels:
app: redis
tier: backend
role: master
spec:
containers:
- name: master
image: k8s.gcr.io/redis:e2e # or just image: redis
This Service and Deployment above are logically grouped together by having the same three identical labels. Notice the role: master
apiVersion: v1
kind: Service
metadata:
name: redis-slave
labels:
app: redis
tier: backend
role: slave
spec:
selector:
app: redis
tier: backend
role: slave
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis-slave
spec:
selector:
matchLabels:
app: redis
tier: backend
role: slave
replicas: 2
template:
metadata:
labels:
app: redis
tier: backend
role: slave
spec:
containers:
- name: slave
image: gcr.io/google_samples/gb-redisslave:v1
This second set of Service and Deployment above are logically grouped together by having the same three identical labels. Notice the role: slave.
Note how all eight the API objects above have the same app: redis label. This links them all together as one logical unit.
apiVersion: v1
kind: Service
metadata:
name: frontend
labels:
app: guestbook
tier: frontend
spec:
type: NodePort
selector:
app: guestbook
tier: frontend
apiVersion: apps/v1
kind: Deployment
metadata:
name: frontend
spec:
selector:
matchLabels:
app: guestbook
tier: frontend
replicas: 3
template:
metadata:
labels:
app: guestbook
tier: frontend
spec:
containers:
- name: php-redis
image: gcr.io/google-samples/gb-frontend:v4
resources:
This third set of Service and Deployment above are logically grouped together by having the same two identical labels:
app: guestbook and tier: frontend
API Object | Name | App | tier | role |
service | redis-master | redis | backend | master |
deployment | redis-master | redis | backend | master |
service | redis-slave | redis | backend | slave |
deployment | redis-slave | redis | backend | slave |
service | frontend | guestbook | frontend | - |
deployment | frontend | guestbook | frontend | - |
This enables great flexibility in selecting on labels just the masters, slaves, frontend or just the backend components.
This is neater and more flexible than having long names such as:
redis-master-service-backend
redis-master-deploy-backend
redis-slave-service-backend
redis-slave-deploy-backend
guestbook-service-frontend
guestbook-deploy-frontend
Using these long names you cannot easily select these objects by category since it is encoded in the object names. ( You need unique and complex regex expression for each category you want to select on. )
Open Policy Agent (OPA) should enforce that there are network policies on the frontend and backend labels. The backend objects must not allow Internet access.
Frontend containers should have stricter security policies compared to the internal access backend deployments ( also enforced by OPA ). ( Frontend is exposed to the world, backend not. )
Summary
This article answered the question "Why is it Important to Use Labels in Your Workloads’ Specs?"
You have seen the power of labels:
- To logically organize
- To enable selective filters
- To understand API object hierarchies
- To enable Open Policy Agent to use labels as a data field to selectively apply policies.
As you have seen above, the organization and filter functionalities of Kubernetes labels are awesome.
Using labels will cause your logs to contain more contextual information, monitoring tools will enable selecting specific workloads you are interested in, bash scripting has a richer set of information to work from.
You can use the Open Policy Agent (OPA) as an admission controller. OPA can allow or deny API object creation based on the value of its labels. More on this next time.
Already working in production with Kubernetes? Want to know more about kubernetes application patterns?
👇👇
Comments and Responses