This article provides technical details about how Magalix Agent works and how it collects and uses cluster data - 5 mins read.
Magalix agent connects your cluster with Magalix backend. It collects the minimum data to provide recommendations and reporting to improving your Kubernetes cluster resources usage and production-readiness. Magalix agent automates on-demand some of the recommendations to reduce any possible human errors and reduce your overall operational overhead. This document explains in detail how the Magalix agent operates within your clusters.
The Big Picture
When you register to use Magalix, we provide you with a command to run to install our magalix-agent on your Kubernetes cluster. The Magalix agent collects metrics periodically and gets the specs about the nodes and workloads. Upon installation, the agent starts a WebSocket connection with the Magalix agent-gateway. It sends the collected Magalix backend and receives executable actions (e.g. changing a given container limit/request to a more optimal value based on user request).
Here are more step-by-step details:
- The agent authenticates with Magalix agent-gateway to establish a WebSocket connection
- Initially, the agent communicates with the customer’s Kubernetes API-server (over HTTPs) to get the workloads specs for the following: pods, controllers, namespaces, and nodes. We then register a hook to listen to any updates in k8s objects and send those updates back to Magalix
- The agent also collects metrics (e.g. CPU and memory usage) every minute by calling the API-server (proxy to each node kubelet / cAdvisor)
- In the case of a network outage, the agent buffers data for up to one hour
- The agent also sends its logs to the Magalix agent-gateway.
- The agent will execute recommendations that are triggered by the customer from our console
Communication Protocol
The Magalix agent connects with Magalix backend through using WebSockets to port 80 on https://agent-gateway.magalix.com. We generate a secret that is shared with the agent (part of the yaml file). Each cluster gets a unique yaml file that is generated on the fly when requested. The access to that file expires after 4 hours if not used to connect a cluster. The agent authenticates by sending account id and cluster id. The agent-gateway responds with a question. The agent appends the secret to the question and hashes that string with SHA512. The agent-gateway validates the hash with the stored secret in our side.
Data Collection
The Magalix agent registers a webhook with the Kubernetes API-master endpoint to get the spec for the following objects: pods, controllers, namespaces, nodes. It also collects metrics every minute (e.g. CPU usage, CPU throttled, memory usage, CPU request/limit, memory request/limit)
The agent has (get, list, patch, and watch) privilege to gather information about these workloads:
"nodes", "nodes/stats", "nodes/metrics", "nodes/proxy", "namespaces", "pods", |
The Data Magalix Agent Does Not Collect
We also have listing permissions on all other Kubernetes objects, but we do not have permission to read them.
The Magalix agent does not have access to any of your containers volumes, VM details, Cluster secrets, or networking interfaces. Also, the Magalix agent does not have any exec command to any of your containers.
Collection Frequency
We collect metrics every minute. We collect k8s objects on agent startup and on object change. The agent buffers the data for one hour should it loses connection to our agent-gateway.
Data Retention
We keep metrics and entities specs in our backend systems for 30 days. All data is anonymized through the UUIDs we provide to each entity, metric, and recommendation.
Auditing Magalix Agent
Magalix agent logs all its interactions with Kubernetes APIs. You can run it in verbose mode to log every single interaction locally. These logs are saved locally. The agent sends error logs and other critical logs to help our support team monitor the health of the installed agents. The table below describes the two major log types Magalix agent currently collects.
Log type |
class |
Included data |
Log Retention period |
Saved location |
Container spec retrieval error |
Error |
|
|
|
Recommendation Automation |
Info |
|
|
|
You can access the Magalix agent’s logs by typing below command.
kubectl logs [pod id] -n kube-system
Controlling Agent’s Privileges
The Magalix agent needs the minimum privileges to make the necessary API calls to Kubernetes Master node. Those access rights are required to collect enough data and metrics to assess your cluster’s capacity and configurations. The below table explains the list of privileges that the Magalix agent needs to perform all necessary read and update operations.
Privilege |
Object |
Impact if Denied |
Get |
"nodes", "nodes/stats", "nodes/metrics", "nodes/proxy", "namespaces", "pods", "limitranges", "deployments", "replicationcontrollers", "statefulsets", "daemonsets", "replicasets", "jobs", "cronjobs" |
We can’t provide any recommendations |
List |
"nodes", "nodes/stats", "nodes/metrics", "nodes/proxy", "namespaces", "pods", "limitranges", "deployments", "replicationcontrollers", "statefulsets", "daemonsets", "replicasets", "jobs", "cronjobs" |
We can’t provide any recommendations |
List |
*.* |
Not currently used, so no impact on the product features |
Watch |
"nodes", "nodes/stats", "nodes/metrics", "nodes/proxy", "namespaces", "pods", "limitranges", "deployments", "replicationcontrollers", "statefulsets", "daemonsets", "replicasets", "jobs", "cronjobs" |
We won’t get updates and our recommendations will only be based on the old spec for the objects we collected |
Patch |
"nodes", "nodes/stats", "nodes/metrics", "nodes/proxy", "namespaces", "pods", "limitranges", "deployments", "replicationcontrollers", "statefulsets", "daemonsets", "replicasets", "jobs", "cronjobs" |
We can’t run automation |
You can edit the agent’s roles any time user below command
kubectl edit clusterrole magalix-agent -n kube-system
What's Next
Let Magalix help you implement best practices for Kubernetes and cloud-native, with your team. explore what you can do with Magalix Console.
References
- What is Magalix?
- What's is the Magalix Agent?
- The Anatomy of Magalix Agent YAML File
- Our Agent source code - https://github.com/MagalixCorp/magalix-agent/
- Sample Magalix Agent YAML File https://github.com/MagalixCorp/magalix-agent/blob/master/magalix-agent.yaml