<img src="https://ws.zoominfo.com/pixel/JHVDdRXH2uangmUMQBZd" width="1" height="1" style="display: none;">

Kubernetes Cost Optimization 101

DevOps Kubernetes cost saving K8s
Kubernetes Cost Optimization 101
DevOps Kubernetes cost saving K8s

Over the past two years at Magalix, we have focused on building our system, introducing new features, and scaling our infrastructure and microservices. During this time, we had a look at our Kubernetes clusters utilization and found it to be very low. We were paying for resources we didn’t use, so we started a cost-saving practice to increase cluster utilization, use the resources we already had and pay less to run our cluster.

In this article, I will discuss the top five techniques we used to better utilize our Kubernetes clusters on the cloud and eliminate wasted resources, thus saving money. In the end, we were able to cut our monthly bill by more than 50%!

1. Applying Workload Right-Sizing

Kubernetes manages and schedules pods are based on container resource specs:

  • Resource Requests: Kubernetes scheduler is used to place containers on the right node which has enough capacity
  • Resource Limits: Containers are NOT allowed to use more than their resource limit

Resources requests and limits are container-scooped specs, while multi-container pods define separate resource specs for each container:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  namespace: magalix
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 2
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
          limits:
            cpu: 1
            memory: 1Gi

Kubernetes schedules pods based on resource requests and other restrictions without impairing availability. The scheduler uses CPU and memory resource requests to schedule the workloads in the right nodes, control which pod works on which node and if multiple pods can schedule together on a single node.

Every node type has its own allocatable CPU and memory capacities. Assigning high/unneeded CPU or memory resource requests can end up running underutilized pods on each node, which leads to underutilized nodes.

In this section, we compared resource requests, limited against actual usage and changed the resource request to something closer to the actual utilization while adding a little safety margin.

2. Choosing the Right Worker Nodes

Every Kubernetes cluster has its own special workload utilization. Some clusters use memory more than CPU (e.g: database and caching workloads), while others use CPU more than memory (e.g: user-interactive and batch-processing workloads)

Cloud providers such as GCP and AWS offer various node types that you can choose from.

Choosing the wrong node size for your cluster can end up costing you. For instance, choosing high CPU-to-memory ratio nodes for workloads that use memory extensively can starve for memory easily and trigger auto node scale-up, wasting more CPUs that we don’t need.

Calculating the right ratio of CPU-to-memory isn’t easy; you will need to monitor and know your workloads well.

For example, GCP offers general purpose, compute-optimized, memory-optimized with various CPU and memory count and ratios:

Kubernetes Cost Optimization 101

 

 

Just keep in mind that 1 vCPU is way more expensive than 1GB memory. I have enough memory in the clusters I manage so I try to make sure that when there is a pending pod, this pod is pending on CPUs (which is expensive) so the autoscaler triggers a scale-up for the new node.

To see the cost difference between CPU and memory, let us look at the GCP N2 machine price. GCP gives you the freedom to choose a custom machine type:

 (# vCPU x 1vCPU price) + (# GB memory x 1GB memory price)

It’s clear here that the 1vCPU costs 7.44 times more than the cost of 1GB.

Kubernetes Cost Optimization 101

You can run multiple worker nodes with different node types and sizes, and control which workloads to run on which node pool using Kubernetes taints and toleration.


Already working in production with Kubernetes? Want to know more about kubernetes application patterns?

👇👇

Download Kubernetes Application Patterns E-Book


3. Autoscaling Workloads

Autoscaling is great because it helps to scale up/down your workloads and shut down nodes to save you money while you sleep.

Many cases can benefit from autoscaling:

  • Variable load web applications: A good example would be a web application which receives variable traffic through the day: traffic increases during certain hours of the day and decreases during the night.

    Kubernetes comes with the Horizontal Pod Autoscaler (HPA) that can scale workload replicas based on their CPU and memory utilization ratio to the resource request. Kubernetes will keep monitoring the target resource in scaleTargetRef and will scale up/down the replica count to keep targetCPUUtilizationPercentage around 75%.
    In this example, the deployment frontend will be scaled up to 20 replicas during the high load, and 4 replicas during the low load.
    apiVersion: autoscaling/v1
    kind: HorizontalPodAutoscaler
    metadata:
      name: frontend
      namespace: magalix
    spec:
      maxReplicas: 20
      minReplicas: 4
      scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        Name: frontend
      targetCPUUtilizationPercentage: 75
    
  • Event-driven workers: background workers that need to be started in multiple replicas when there are messages in a Kafka topic or a message queue. It can be scaled to zero when there are no messages.

    Compared to HPA, there is a more advanced Kubernetes Event-driven Autoscaling (KEDA) that can integrate with Prometheus/PosqreSQL/Kafka/Redis and many more to scale based on more advanced metrics from multiple data sources.

    In this example, we installed this KEDA ScaledObject custom resource definition to scale the worker deployment eventer replicas. When the Kafka consumer lag changes, it can scale to 0 when there are no messages to consume and can scale up 1 replica for every 10,000 lagged messages up to 8 when the lag is more than 80,000:
    apiVersion: keda.k8s.io/v1alpha1
    kind: ScaledObject
    metadata:
      labels:
        deploymentName: eventer
      name: eventer
      namespace: magalix
    spec:
      cooldownPeriod: 10
      maxReplicaCount: 8
      minReplicaCount: 0
      pollingInterval: 15
      scaleTargetRef:
        deploymentName: eventer
      triggers:
      - metadata:
          metricName: eventer
          query: sum(kafka_consumergroup_lag{consumergroup="eventer-group"})
          serverAddress: http://prometheus-server.monitoring.svc.cluster.local
          threshold: "10000"
        type: prometheus
    

4. Autoscaling Worker Nodes

After scaling workloads, you will notice the number of running pods is low, but this won’t save you money unless we configure auto-scaling the worker nodes

Some Could providers provide node autoscaling out of the box on some node pools. Cluster Autoscaler can help you manage worker node autoscaling.

GCP: Kubernetes Engine → Cluster → Node Pool

Kubernetes Cost Optimization 101

AWS: EKS → Cluster → Node Group

Kubernetes Cost Optimization 101

Azure: Kubernetes services → Node pools → Scale → Automatic

Kubernetes Cost Optimization 101

The result of scaling workload and worker nodes together can end with the node count trends:

Kubernetes Cost Optimization 101

5. Purchasing Commitment/Saving Plans

Running a Kubernetes service is relatively cheap. What’s most expensive is the worker nodes compute cost.

  • GCP offers “Commitment Plans” on a certain number of vCPUs, memory, GPUs, and local SSDs for 1 or 3 years. This can save up to 57% of the compute cost
  • AWS offers “Compute Saving Plans” to commit to using a certain amount of money on compute every month, as well as “Reserved Instances” to commit to using a certain type of machine, both for 1 or 3 years with a possible savings of up to 60%
  • Azure offers “Azure Reserved VM Instances” such as AWS with a possible savings of up to 60%

Conclusion

As we read in this article, there are multiple factors and considerations when trying to reduce your cluster cost. Going through the whole process can give you huge savings. We have managed to reduce our cluster daily cost by 56%- and you can do the same!

Kubernetes Cost Optimization 101

Understand how resource utilization, application UX and performance, and cost work in cloud-native.

👇👇

Learn More


 

Comments and Responses

Related Articles

Team Productivity: Resource Management

Since the introduction of containers, the method of building and running applications in an organization has

Read more
Capacity Management for Teams on Kubernetes: Setting Pod CPU and Memory Limits

Capacity management is a complex, ever-moving target, for teams on any infrastructure, whether on-prem,

Read more
Kubernetes cost saving K8s
Kubernetes Cost Optimization with Magalix

Is our Spending Getting Worse? I woke up one day to see this email from our CEO in my mailbox. I knew this

Read more

start your 14-day free trial today!

Automate your Kubernetes cluster optimization in minutes.

Get started View Pricing
No Card Required