1. Help Center
  2. HOW MAGALIX WORKS

Magalix Agent - Deep Dive

This article provides technical details about how Magalix Agent works and how it collects and uses cluster data - 5 mins read.

Magalix agent connects your cluster with Magalix backend. It collects the minimum data to provide recommendations and reporting to improving your Kubernetes cluster resources usage and production-readiness. Magalix agent automates on-demand some of the recommendations to reduce any possible human errors and reduce your overall operational overhead. This document explains in detail how the Magalix agent operates within your clusters. 

The Big Picture

When you register to use Magalix, we provide you with a command to run to install our magalix-agent on your Kubernetes cluster. The Magalix agent collects metrics periodically and gets the specs about the nodes and workloads. Upon installation, the agent starts a WebSocket connection with the Magalix agent-gateway. It sends the collected Magalix backend and receives executable actions (e.g. changing a given container limit/request to a more optimal value based on user request).


Magalix Agent Data Workflow

Here are more step-by-step details: 

  1. The agent authenticates with Magalix agent-gateway to establish a WebSocket connection
  2. Initially, the agent communicates with the customer’s Kubernetes API-server (over HTTPs) to get the workloads specs for the following: pods, controllers, namespaces, and nodes. We then register a hook to listen to any updates in k8s objects and send those updates back to Magalix
  3. The agent also collects metrics (e.g. CPU and memory usage) every minute by calling the API-server (proxy to each node kubelet / cAdvisor)
  4. In the case of a network outage, the agent buffers data for up to one hour
  5. The agent also sends its logs to the Magalix agent-gateway.
  6. The agent will execute recommendations that are triggered by the customer from our console

Communication Protocol

The Magalix agent connects with Magalix backend through using WebSockets to port 80 on https://agent-gateway.magalix.com. We generate a secret that is shared with the agent (part of the yaml file).  Each cluster gets a unique yaml file that is generated on the fly when requested.  The access to that file expires after 4 hours if not used to connect a cluster. The agent authenticates by sending account id and cluster id. The agent-gateway responds with a question.  The agent appends the secret to the question and hashes that string with SHA512. The agent-gateway validates the hash with the stored secret in our side. 

Data Collection

The Magalix agent registers a webhook with the Kubernetes API-master endpoint to get the spec for the following objects: pods, controllers, namespaces, nodes.  It also collects metrics every minute (e.g. CPU usage, CPU throttled, memory usage, CPU request/limit, memory request/limit)

The agent has (get, list, patch, and watch) privilege to gather information about these workloads:

"nodes", "nodes/stats", "nodes/metrics", "nodes/proxy", "namespaces", "pods", 
"limitranges", "deployments", "replicationcontrollers", "statefulsets",
"daemonsets", "replicasets", "jobs", "cronjobs" 

The Data Magalix Agent Does Not Collect

We also have listing permissions on all other Kubernetes objects, but we do not have permission to read them.  

The Magalix agent does not have access to any of your containers volumes, VM details, Cluster secrets, or networking interfaces. Also, the Magalix agent does not have any exec command to any of your containers. 

Collection Frequency 

We collect metrics every minute. We collect k8s objects on agent startup and on object change. The agent buffers the data for one hour should it loses connection to our agent-gateway. 

Data Retention 

We keep metrics and entities specs in our backend systems for 30 days. All data is anonymized through the UUIDs we provide to each entity, metric, and recommendation. 

Auditing Magalix Agent

Magalix agent logs all its interactions with Kubernetes APIs. You can run it in verbose mode to log every single interaction locally. These logs are saved locally. The agent sends error logs and other critical logs to help our support team monitor the health of the installed agents. The table below describes the two major log types Magalix agent currently collects.

 

Log type

class

Included data

Log Retention period

Saved location

Container spec retrieval error

Error

  • Cluster-ID
  • Container ID
  • Container Name
  • Pod Name
  • Namespace name   
  • 24 hours in our backend
  • 30 days for local logs
  • Local file (agent.log)
  • Magalix backend

Recommendation Automation

Info

  • Recommendation Id
  • Service Id
  • Container Id
  • Namespace name
  • Service Name
  • Workload Kind
  • Dry Run Flag
  • Recommendation Values
  • 24 hours in our backend
  • 30 days for local logs
  • Local file (agent.log)
  • Magalix backend

 

You can access the Magalix agent’s logs by typing below command.

kubectl logs [pod id] -n kube-system

Controlling Agent’s Privileges

The Magalix agent needs the minimum privileges to make the necessary API calls to Kubernetes Master node. Those access rights are required to collect enough data and metrics to assess your cluster’s capacity and configurations.  The below table explains the list of privileges that the Magalix agent needs to perform all necessary read and update operations.


Privilege

Object

Impact if Denied

Get

"nodes", "nodes/stats", "nodes/metrics", "nodes/proxy", "namespaces", "pods", "limitranges", "deployments", "replicationcontrollers", "statefulsets", "daemonsets", "replicasets", "jobs", "cronjobs"

We can’t provide any recommendations

List

"nodes", "nodes/stats", "nodes/metrics", "nodes/proxy", "namespaces", "pods", "limitranges", "deployments", "replicationcontrollers", "statefulsets", "daemonsets", "replicasets", "jobs", "cronjobs"

We can’t provide any recommendations

List

*.*

Not currently used, so no impact on the product features

Watch

"nodes", "nodes/stats", "nodes/metrics", "nodes/proxy", "namespaces", "pods", "limitranges", "deployments", "replicationcontrollers", "statefulsets", "daemonsets", "replicasets", "jobs", "cronjobs"

We won’t get updates and our recommendations will only be based on the old spec for the objects we collected

Patch

"nodes", "nodes/stats", "nodes/metrics", "nodes/proxy", "namespaces", "pods", "limitranges", "deployments", "replicationcontrollers", "statefulsets", "daemonsets", "replicasets", "jobs", "cronjobs"

We can’t run automation

You can edit the agent’s roles any time user below command

kubectl edit clusterrole magalix-agent -n kube-system

What's Next

Let Magalix help you implement best practices for Kubernetes and cloud-native, with your team. explore what you can do with Magalix Console. 

Connect Your First Cluster


References

  1. What is Magalix?
  2. What's is the Magalix Agent?
  3. The Anatomy of Magalix Agent YAML File
  4. Our Agent source code -  https://github.com/MagalixCorp/magalix-agent/
  5. Sample Magalix Agent YAML File https://github.com/MagalixCorp/magalix-agent/blob/master/magalix-agent.yaml