Learn what Magalix monitors and how it makes scalability decisions - 3 minutes read
It all starts the moment a cluster is connected to Magalix.
Our AI starts resource monitoring to achieve the best performance with the least resources. The moment containers are provisioned, the AI goes through a continuous process of understanding, analyzing, and generating scalability decisions.
Magalix autoscaling workflow
Below is a high-level description of the workflow the continuously takes place once a Kubernetes cluster is connected.
- Magalix agent continuously scans for changes in pods, containers, and nodes. It then sends relevant metrics to Magalix backend for further analysis.
- Encrypted Metrics data is sent through a secure tunnel, and does not contain any cluster or container identification data. Each component gets assigned a random unique Id that is used to reference these entities at later stages in the workflow.
- Metrics are cleaned up and downsampled in different ways to only keep relevant details for further analysis and visualization.
- With every few data points, predictive models are built or updated. The main purpose of these models is to identify repeatable patterns at different time resolutions. For example, are there hourly patterns when the models take a look at the 1-minute resolution. Are there any seasonal patterns when it looks at 12-hours metrics resolution?
- It is important to correlate metrics together to infer the nature of the container, i.e. memory-bound vs CPU bound. This correlation process helps in resources allocation, the correlation engine also learns how containers relate, depend on each other and how a change in CPU of one may impact another container.
- If the inferred patterns along with current resources and dependencies require a change in resources, a container scalability decision is generated.
- If container scalability requires a change in the underlying infrastructure, one or more decisions are generated to update the node's capacity. These changes can be horizontal (adding more nodes), or vertical (rebooting a node with new CPU/memory configurations).
- The consolidation of decisions at this point is necessary to make sure that decisions are not unnecessarily overlapping or redundant.
- Decisions are then sent back to the agent to be executed at the right time.
A Bit More Details
Magalix collects resources utilization and KPI metrics. It collects CPU, memory, and I/O usage patterns at a set frequency. Metrics and behavioral analysis take place at the level of containers, services, and applications. Our proprietary algorithms simplify and aggregate these metrics to use them as different features fed into the machine learning models.
When you specify KPIs (Key Performance Indicators), our AI models analyze these KPIs and use them to optimize the resources of your application. For example, if you define an HTTP-endpoint as a KPI for your application, Magalix will start building prediction models around the average calls and latency to optimize CPU, memory, and I/O for all application's containers to keep within the specified KPI.
Understanding Runtime Architecture
As metrics are collected, our AI models start building a conceptual model of the application’s architecture.
For example, interdependencies between containers and services are formed; and each container gets profiled and marked if it is a CPU, memory, or I/O intensive and gets scored accordingly. This understanding provides Magalix with the ability to make high precision scalability decisions under different usage patterns and workloads.
Magalix generates hourly custom scalability decisions for each container to meet your application performance and cost goals.
Usage Prediction and Resources Consumption Correlation
The next step is to identify repeatable patterns and account for future changes in workloads. The prediction models enable the rest of the AI pipeline to get ready for future spikes and anticipated changes. It builds on the previous step and analyzes a series of scalability decisions to achieve the best possible performance with the least amount of resources. Decisions are based on the conceptual architecture generated in the previous step, the profile of each component, and the anticipated intensity of workloads.
Execute and Learn
Through Magalix nervous systems, Magalix AI continuously evaluates decisions and if the anticipated conditions are still valid, it will execute scalability decisions and monitors its impact on the application’s KPIs, resources consumption, and cost of keeping application within its performance goals. It uses these observations as feedback to the prediction models and decision-making engine.