A step by step guide to improve your cloud-native application performance, achieve the best Kubernetes utilization, and save up to 80% on cloud infrastructure using Magalix
Applications performance, Kubernetes cluster utilization, and cost efficiency are crucial for the success of your cloud-native technology. It is a challenge to bring the whole team on-board in this journey and keeping them agile. Engineering Teams and DevOps need to manage a lot of moving parts while figuring out a proper migration path to be cloud-native. They need to maintain legacy systems and redesign them to meet the best containerization and operational patterns.
Smart engineers proactively anticipate and tackle challenges that will slow their team down.
Improving your users’ experience, enabling your team to have a reasonable control of Kubernetes cluster resources, and maximizing the return on investment of your company’s cloud infrastructure are crucial for your agility and success. This is a step-by-step guide to achieve these three goals in a very short time and effortlessly.
What Makes Teams Successful in Their Cloud-Native Journey
Companies that emphasize the importance of innovating their customer experience are quick to see the value of adopting the cloud-native development model.
According to IBM’s research on cloud-native adoption, below are the most important factors for the success of any team in their cloud-native journey:
- Flexibility and speed, which includes the flexibility to add features, and to scale resources up or down to meet users demands and achieve your SLA.
- Application performance, which covers users experience and the team’s ability to improve app quality, automating application tuning, and reduce dependencies.
- Working seamlessly and getting the most value out of the public cloud, which means that applications and teams can use feature rich platforms and public cloud products to achieve business goals in the shortest time possible and efficiently.
Achieving these goals with the current tools is hard and time-consuming. Teams need to learn about the fast-evolving cloud-native stack while innovating fast. Offloading some parts in the early days adopting cloud-native architecture is important for you and your team. You want to move fast on what really matters to your business and your team.
Magalix provides a cloud-native solution to monitor your application's performance, keep Kubernetes cluster utilization tuned to business needs and to get the most value out of your cloud provider. Magalix features are unlocked with a single command line. That command installs our agent. You will get in few minutes the full picture of your applications and Kubernetes cluster. Once your Kubernetes cluster is connected, you will get the following:
- Automatically tune your application performance and resources needs by adjusting allocated resources proactively and as needed.
- Detailed capacity analysis and the best ways to keep your team on top of their application’s performance and resources utilization.
- Detailed analysis of cluster nodes capacity and current cost model. You can save up to 80% of your current spending on Kubernetes clusters.
- Team oriented view of your Kubernetes cluster. You can invite the rest of your team to manage applications performance and infrastructure capacity together.
- A comprehensive set of metrics and dashboards to see a full picture across all your clusters and applications.
How Can You Improve App Performance, Optimize Kubernetes Capacity, And Save On Cloud Bill Using Magalix
The only pre-requisite to start optimizing your Kubernetes cluster is to have admin rights on your existing Kuberentes cluster(s). You will go through 4 simple steps to get a deeper understanding of your applications performance, cluster utilization, and cost optimization.
- Step 1 - connect your Kubernetes Cluster to Magalix Backend
- Step 2 - Check for Performance Bottlenecks and Capacity Waste
- Step 3 - Tune Performance and Capacity with Magalix Autopilot
- Step 4 - Save Time and Money with the Node Advisor Recommendations
- Step 5 (bonus) - Bring Other Team Members On-Board
Step 1 - Connect Your Kubernetes Cluster to Magalix Backend
In order to get started you will need to install Magalix agent in your Kubernetes cluster. The installation process is very simple. Copy the provided kubectl command and past it inside your cluster’s terminal.
1. Get Magalix Agent Kubectl Installation Command
Note: The provided URL to the agent’s deployment.yaml file is valid for 4 hours only and specific to your cluster. You should not share or reuse this file to connect other clusters. You can connect as many clusters as you need but each will need its own unique YAML file.
2. Run the Kubectl Command
Run the kubectl command in your Kubernetes terminal to provision Magalix Agent.
You should get a confirmation in your terminal that command executed without any issues. The following takes place in the background:
- Magalix agent deployment is controller is created.
- Necessary secrets that agent needs, such as cluster Id, access token, etc., are created
- Service account, cluster role and cluster role binding are created. Magalix agent needs these to have enough credentials to read metrics and configurations. These are required to get any recommendations.
If your cluster is having restrictions downloading files directly from the Internet, you can download the YAML file and run it yourself. To learn more about the content and what the YAML file does, please read our reference documentation.
- Magalix agent is open source and you can examine, or even better contribute to its code base.
- Magalix agent is scanned before making it publicly available for any security vulnerabilities.
- During your free trial, you can connect as many clusters as you want. There is no restriction on that.
3. Install Magalix Agent through GCP Marketplace
You can install Magalix agent through GCP market place. If you have a GKE cluster you can install Magalix agent with a single click of a button. Give Magalix page on GKE marketplace a visit to install the agent on any of your GKE clusters.
Click on the CONFIGURE button at the Magalix agent GCP marketplace page to install Magalix agent.
The next screen will ask you to insert basic details to configure the agent.
- Create a cluster or select an existing Kubernetes cluster. If you choose to create a new cluster, you will need to come back again to this page and configure Magalix agent.
- Namespace. Magalix agent is installed in the default namespace. You can install it in a separate namespace if you want to apply any namespace-level resources or security restrictions.
- App instance name allows you to give it a more descriptive name.
- Email address is the email address that you will use to access Magalix console and dashboards. Installing Magalix agent through GCP marketplace automatically creates you an account at Magalix.
- If you already have a Magalix account, insert email and password in below textbox to associate this cluster to your existing Magalix account.
- If you don’t have an account with Magalix, a random password will be generated and sent to the email address you inserted here.
- Existing Password is needed in if you already have a Magalix account. It will be used to automatically log in and associate the connected cluster with your account at Magalix.
- AGENT_SERVICE_ACCOUNT will be automatically created with all needed access rights.
- It is recommended to leave the default value of this field to avoid credentials issues with Magalix agent.
Once you are done with necessary inputs, the agent will be automatically installed and configured in your cluster. Check your inbox for the welcome email and login at https://console.magalix.com to make sure that your cluster is properly connected.
Step 2 - Check for Performance Bottlenecks and Capacity Waste
Once data start flowing to Magalix backend, a lot of magical steps takes place. Magalix will analyze the performance, capacity and cost efficiency of your Kubernetes cluster. It will build several dashboards to give you a global overview of these aspects across your clusters. Take a tour of Magalix console to familiarize yourself with what you can observe and achieve with it.
1. Performance Bottlenecks
Let’s check if your applications have any performance bottlenecks. The performance analysis card in your home dashboard tells if your containers were throttled recently. Magalix tracks two cpu.stats metrics to detect CPU performance bottlenecks:
- nr_throttled which is the number of times the container has been throttled/limited, and
- throttled_time, which is the total time duration (in nanoseconds) for which containers of the group have been throttled.
The higher the number of these metrics, the worse your container is going to perform. Magalix tracks these metrics across your whole cluster and suggests if you need to adjust CPU resources that should be allocated to containers and microservices.
The home screen performance analysis card lists the top throttled containers for the last 24 hours. The CPU cores deficit/sec is an estimate of how many cores should have been allocated to the container and avoid it being throttled. In the below example, the forecaster-worker had 20 instances and on average each instance needed 2 additional cores. The side chart shows collectively CPU throttling overtime and if containers are having excessive loads at certain times of the day.
Click on any of the throttled containers to see more detailed analysis. You will see action items to improve that container’s performance. In below snapshot, you will notice that due to relatively low CPU limits the forecaster-worker is throttled multiple times. Magalix prediction model, the grey area, showing that the same CPU load will continue for the next 4 - 8 hours. Scroll down to see the recommendations
In below’s snapshot, you will notice a summary of generated recommendations an area chart to help you decide what to do next. The summary card gives you an idea of generated recommendations/decisions to improve resources management for that particular container. The area chart compares the used CPU/Memory cycles, recommended, and currently allocated.
Click on the number of generated decisions. You will see recommendations to improve the performance of this container - see below sample snapshot.
Click on any of these decisions to see full decision analysis. The decision analysis will help you understand how this recommendation could have improved your container’s performance. In the below decision analysis example, you will notice that Magalix is suggesting two changes:
- Increase the CPU limits to reduce CPU throttling, and
- Reduce allocated memory since there is a lot of memory waste.
In the next step how you will automatically resolve all those performance issues without manually going through all of these screens and numbers. But let’s see first how you can check for any capacity waste.
2. Wasted Capacity
Capacity is wasted when containers get resources allocated to them more than they would use. Such a waste is quite common. Engineers would like to be at the safe side and consider the worst-case scenario. Let’s go again to the home dashboard and take a look at the capacity analysis card. This card provides an overview of:
- Generated decisions and their impact on saving your CPU/memory over the last 24 hours, and
- Charts comparing CPU/memory usage, Magalix recommended allocation, and what is currently allocated. In below’s snapshot, you will notice that Magalix could have saved around combined 1.7k cores and 3TB of memory last 24 hours if the Magalix autopilot was on.
Click on the decisions to see the full list of recommendations to save the identified wasted capacity.
You will see a list of decisions to improve utilization. You can filter by an entity and decision impact. Click on any decision to see the detailed analysis of its impact.
You will see the same decision analysis card. Magalix in the below-shared example is recommending to reduce allocated CPU and memory to save some resources. The card helps you to asses the recommendations by showing the historic and predicted metrics.
Step 3 - Tune Performance And Capacity With Magalix Autopilot
Magalix autopilot does the resources allocation automatically. It brings recommendations to life and executes them for you at the right times. Magalix proactively works on allocated resources to improve performance and resources utilization. Generated decisions are due in the near future, usually within 1 to 2 hours from generation time. When you enable the autopilot, the agent will execute the next set of decisions for your containers.
Autopilot is enabled at the level of the namespace. Once enabled, all decisions moving forward will be executed for you. You can enable the autopilot either from the cluster’s dashboard or from the namespace dashboard.
Step 4 - Save Time and Money with the Node Advisor Recommendations
After applying the previous three steps, your cloud-native applications should now run without any performance penalties and with optimal resources utilization. It is time now to save some money and share some cool numbers with your manager and team :)
Magalix does a detailed analysis of your cluster nodes and provides recommendations to have the right capacity to achieve the highest ROI (Return on Investment) out of our cloud infrastructure. It will suggest the best combination of nodes and their sizes for best performance, utilization, and lowest cost possible.
- How much you could save if you applied Magalix recommendations, i.e. Annual savings in the cost of infrastructure.
- How much capacity could be eliminated, i.e number of cores and memory in GBs that the proposed plan will save you.
- A comparison between the hourly cost of your current infrastructure (red area) and Magalix recommended infrastructure (green area).
- Estimated improvements in CPU/Memory utilization after applying Magalix recommendations compared to current infrastructure. For example, if your average CPU utilization is 20% and the new plan will pump your utilization to be 30%, this means that you could achieve a 50% improvement in your overall utilization.
Click on the Show Reports link to see the detailed recommendation and analysis of different options. In the below snapshot, Magalix Node Advisor given the recommendation to change nodes types and counts based on the workload analysis it has done last 12 to 24 hours. You will notice that the suggested capacity is lower than the current capacity to optimize current CPU utilization and provide around 45% cost savings based on demand pricing.
Magalix Node Advisor compares the pricing of different plans and how much you would be saving out of each one. In the below snapshot it is comparing three of the Google cloud billing plans.
1. Tweak Node Advisor Optimization Policies
You can configure the Node Advisor optimization policy for each of your clusters. The Node Advisor will initially work with the default configurations to maximize your savings. Its default configurations centered around increasing the density of your containers by decreasing the number of nodes whenever possible. However, you can configure it to match your specific environment needs. To tweak the Node Advisor to your needs, go to the cluster dashboard and click on the settings icon at the top right corner.
Below is the Node Advisor settings box. Each cluster has its own Node advisor settings. Settings change will be applied to the next analysis cycle.
- Magalix currently supports capacity and cost analysis on AWS, Azure, Google Cloud, and IBM.
- For on-prem and unsupported cloud infrastructure, Magalix provides recommendations on needed capacity without the cost analysis piece. It suggests the best per-node capacity, i.e. the number of cores and memory in GB, and the total number of nodes for your cluster.
Step 5 (Bonus) - Bring Other Team Members On-Board
It is even cooler to bring the rest of your team on-board and give them visibility to the performance and resources utilization of their applications. You can invite as many as you want to Magalix console. It will be always free to add team members.
Click on Invite Users at the top right menu.
Insert emails and names of users you would like to invite to Magalix. They will get their invitations immediately. They need to register using the email address you provided. They cannot use any other emails.
1. Manage Team Members
You can remove, de-activate or re-activate your team members access to Magalix console any time. Go to the top right menu and click on Manage Users to make any changes to your team.
If you haven’t connected your Kubernetes yet, Register and connect your existing cluster. Within a few minutes, cluster metrics will start flowing and our AI engine will engage shortly analyzing your containers and infrastructure.
Get your 14-days free full access trial now!