When it comes to managing the behavior of your containers, Kubernetes does a good job of taking care of common use cases. My journey with Kubernetes started sometime in 2015, pre-1.0, so I’ve had my fair share of spinning up clusters and deploying containers.
A while back, I started in this environment where the build system would overwrite the latest image everytime containers were built. For those that have gone through a similar experience, you’ll probably know where I’m going with this, but for those that are currently doing this in practice, let me tell you that bad things can happen and why it becomes a nightmare to troubleshoot.
The thing about nightmares is that even though they may only last a short time, they seem like they go on forever. In this particular situation, it definitely shared those characteristics. I was fairly new and unaware of the latest tag getting overwritten with every build. One evening, upon troubleshooting an unresponsive pod, I deleted it. No problem, it will come right back. What I didn’t notice was a new version of latest had downloaded and started.
Although the service came up properly, that version of the container was not ready for prime time. Upon testing the public endpoint, I was receiving various results. In a highly available situation where services are behind a load balancer, chances are one of the services in the pool is different from the rest. As I went through a series of various checks, I could not identify the problem. All the instances of this pod were using latest, and based on the logs, everything seemed normal. Why was one different?After troubleshooting all the way back to the build system, I was finally able to identify what the issue was. I couldn’t alter the tags in the build system since I didn’t fully understand the downstream implications. Also, it wouldn’t have bought me anything since I was looking for an older version. Scaling down also wasn’t an option since it might scale down an instance of the correctly versioned pod. The only thing I could do was find the version of the source code, and manually rebuild that version to replace latest, forcing me to roll forward. I also got lucky picking the right version of the code.
When speaking with the rest of the team at Magalix I came to find out others have experienced similar stories like the one I just shared. We agreed that in order to save others from having to endure a similar experience, we should build a policy in our KubeAdvisor to detect a set tag (since no tag defaults to latest), and to ensure that tag isn’t set to latest. You definitely wouldn’t want to be in a similar situation as I was so leveraging Magalix to help you avoid the same mistakes that have been made in other environments.
KubeAdvisor was built with governance in mind. By enabling our out-of-the-box best practices, like our image_tag_policy, we can help you build and maintain a strong governance-as-code library.