The availability of elastic cloud computing, scalable cloud storage, and Infrastructure as a Service (IaaS) from a variety of cloud providers, present unique opportunities for many companies looking to be competitive in a new age where software is conquering the world. However, the 24/7 nature of modern software systems which are expected to be highly-available, responsive, and infinitely scalable, present unique challenges to traditional development and operations teams.
With regard to these opportunities and challenges, companies tend to operate in one of two modes: innovation mode or firefighting mode. Innovation mode is when the development and operations teams are focused on creating new solutions and moving fast from ideation, to production, and generating business value. Firefighting mode is when the development and operations teams are not in sync (occasionally even hostile toward each other), projects are delayed due to constant interruptions of engineering teams, and a continuous stream of production issues and instabilities resulting in unsatisfied customers.
What is Governance?
Governance refers to the ability of the operations team to verify and enforce certain policies and standards across the entire organization or within specific clusters. By reducing variations in the infrastructure you reduce your maintenance cost and attack surface. Also, having standardization enables automation of common tasks and improves efficiency at an organizational level. The policies and standards you want to enforce come from your organization’s established guidelines or agreed-upon conventions, and best practices within the industry. It could also be derived from tribal knowledge that has accumulated over the years within your operations and development teams.
Why is Governance Important?
As a business, you want to focus on innovation that differentiates your business and generates value for your customers, rather than churning time and resources on maintaining your infrastructure. Basically, you want to be in innovation mode rather than firefighting mode. In a cloud-native ecosystem, decisions are decentralized and are tackled at a rapid pace. Having a governance framework will allow your company to move fast but at the same time minimize risk, control costs, and drive efficiency, transparency, and accountability across the organization.
There are three key dimensions that need to be defined in order to establish a governance framework for your organization:
- Targets: the clusters, workloads, or entities where you want to apply governance
- Policies: the rules or standards you want to validate against your specified targets
- Triggers: the catalyst - when the policy should be checked (e.g., after git push, before Kubernetes deployment, every 24 hours, every time after an object spec changes in the cluster, etc.)
Once you have the targets, policies, and triggers defined, you need a way to enforce them to ensure compliance. Doing so manually is a sure way to put your organization in firefighting mode incessantly. The best way to implement this framework is to use tools that can automate the compliance check process based on the triggers you defined, as well as provide a user-friendly way to manage the policies and their targets.
Help your team get up to speed with guidance on complex governance and compliance issues.
Examples of Governance Policies
Governance could span multiple areas of your operational environments. You want your Kubernetes clusters to be reliable and secure and you want to control who has access and limit the usage of available infrastructure resources. Also, you want to enforce certain rules for your network ingress and egress traffic. In this section, we will cover some examples of policies you might want to enforce as part of your governance. This should be regarded as a start point, and is not intended to be a complete or comprehensive list.
You want to ensure and improve the continuity of your business applications. This is done through making your system highly available and fault tolerant. Here are some examples of policies you can enforce In your Kubernetes clusters:
- Verify that the spec’s replicas count is 2 or greater, to ensure redundancy in your ReplicaSets for fault tolerance.
- Ensure that afinity.podAntiAfinity is set in your deployment spec to avoid having multiple pods - from the same deployment - running on the same node.
- Check that readinessProbe and livenessProbe are defined in your containers spec to guarantee that only healthy pods get traffic.
Define rules and conditions related to access and privilege that pods must meet to be allowed to run in your cluster. For example:
- Control the user IDs and group IDs allowed for your containers to run by checking runAsUser and runAsGroup.
- Enforce the settings of allowPrivilegeEscalation=false and mustRunAsNonRoot so the container and its child process cannot escalate their privilege or change their capabilities.
- Require that containers have read-only access to the filesystem by ensuring that ReadOnlyRootFilesystem is set.
Check that best practices are applied and your network rules are followed for ingress and service objects defined in your cluster:
- Avoid using hostPort and hostNetwork for any pod since this could limit the number of places the pod could run, since hostIP.hostPort.protocol combination must be unique. Additionally, avoid using hostNetwork (for the same reason).
- Ensure your publicly exposed load balancer’s selector.k8s-app are limited to certain (necessary) controllers.
4. Access Control
Implement role based access control and enforce your policies:
- Disable default namespace, to force every object in your cluster to be assigned a proper namespace that you have set.
- Check that no RoleBinding objects give patch access to users that you haven’t approved.
- Check for rules.apiGroups, rules.resources, and rules.verbs combinations that might violate any of your access control policies.
5. Operational Excellence
General best practices that you’ll want to maintain across your cluster, or for certain types of workloads:
- Enforce that certain keys exist under metadata.labels for all your StatefulSet (like an owner name or email).
- Check that container.image in all your specs are using a trusted container registry.
- Do not allow any container.image with the :latest tag. Force the use of specific versions.
Again, these were simply a few examples of areas where you can implement some governance rules and best practices. At Magalix, we’ve already implemented some of these policies so you can have a basic governance framework out of the box for your organization, which you can also expand and customize as you see fit (LINK)
Implementing Governance and the Open Policy Agent
As part of the CNCF project, the Open Policy Agent (OPA) is a great tool that allows organizations to easily define custom policies for their Kubernetes environments. Open Policy Agent policies are written in a declarative policy language called Rego. With Rego, you can filter the input (workloads, users, other entities) to match the “targets” you want and you can add assertions that would define your “policy.” It’s important to have a common framework and tools to enforce your policies so that you can easily evolve your policies and enforce them consistently within your different operations environments. While it is possible to install and configure your own setup of OPA, that would require a much higher degree of know-how, effort, and maintenance.
Magalix allows you to define and manage these policies, and their lifecycle, in an easy and user-friendly way. Internally, Magalix uses OPA as the policy execution engine. Magalix, by default, will run global policies based on Kubernetes’ best practices for the relevant Kubernetes objects in your cluster. Also, it will allow you to define custom policies (KubeAdvisor), run these policies periodically against your cluster, and track violations and compliance overtime to report regular updates (via dashboard or through integrations with Slack, etc.). Additionally, Magalix offers webhooks that you can use to pass your object specs, and we’ll run all relevant policies and respond to you with any violations. You’ll find this is a great way to integrate your CI/CD pipeline, and easily prevent policy violations.
To read about how we defined a simple governance framework at Magalix using our own product for a governance issue we faced in the past, stay tuned for our upcoming article, Kubernetes Governance with Magalix.