Cloud-Native applications bring innovation to the forefront of your business. Delighting your users with the latest and greatest experiences as fast as possible sometimes comes at the cost of neglecting one of your most important considerations; Security. Governing the security of your cluster begins with understanding the various pieces susceptible to attack, and then implementing the right solutions to continuously ensure its protection. Whether your teams have just deployed their first microservice to Kubernetes in production, or have been managing a global fleet of clusters for several years, security and governance can no longer be afterthoughts.
Kubernetes has played a tremendous role in changing the way we build and manage our applications. It has also changed the way we need to think about governance and security. If you thought securing Kubernetes was hard, you wouldn’t be alone. In a survey conducted on behalf of D2iQ, 47% of respondents said the most challenging aspect of Kubernetes was security. As challenging as it might seem, securing your clusters is not as out of reach as you may believe.
Although some form of governance applies to all verticals, the implementation is typically commensurate to the business. In cases where industry regulations are the predominant drivers for implementing highly governed infrastructures, adding in the complexity of securing your entire digital footprint can no longer be strictly reserved for your security teams to manage. The journey of transitioning from monolithic to microservice with what seems like a thousand moving parts and a continuously evolving landscape creates an increasingly difficult obstacle for just one team to overcome . As it took an entire engineering organization to get your business to the point it’s at now, it will continue to take the same level of effort to maintain a secure and compliant environment.
A Zero Trust Philosophy
We are in a connected world. Microservices communicate with other microservices, which in turn can send requests to a SaaS solution over the Internet. How does a microservice verify the identity of an external solution it is communicating with? How does a microservice verify the microsevice sitting next to it? In a more simpler scenario, a common misspelling of a URL could mean sensitive data is getting sent to the wrong 3rd party. When do we discover this? How do we discover this? How do we prevent this?
Looking within, are the open source packages that are getting imported into the latest build being vetted for malicious code? The open source community is a wonderful resource to help your teams build the next big thing while not reinventing the wheel, but how often is someone sifting through the source code of those 3rd party packages? A popular open source library could be a signal that the code is safe, but it certainly isn’t a guarantee.
As you embark on your journey towards compliance, start with the mindset that nothing is to be trusted. As services get developed and connect to more services, understanding that everything needs to be secure can help with implementing meaningful governance policies that can be accounted for during development, and not after deployment.
Multi-Layered approach to Governance
Mitigating Areas of Risk within the Cluster
Default Kubernetes configurations are fairly insecure. Configuration management tools have facilitated bootstrapping clusters with security features enabled, but not all options are configured during provisioning. In order to prevent unscheduled outages due to malicious activity and to maintain your service level agreements, here are 5 areas that should be managed with specific attention.
1- Kubernetes Control Plane
The Kubernetes control plane is made up of several components that make global decisions within the cluster. ETCD, a distributed key/value store, is the component of the control plane that stores the cluster’s state and configuration. Think of it as the backing store for all your cluster’s data. Ensure your data is encrypted at rest and restrict access by setting up a username and password combination.
Another important component of the control plane is the Kubernetes API Server. Consider this the API layer of ETCD. Due to the level of access the API server has to ETCD, the API endpoint is not something you’d want to expose openly to the Internet. Keep it within your firewall and ensure only authorized users have access to it. This can be controlled by either granting specific permissions using Kubernetes’ built in Role Based Access Control, or by certificate issuance and kubectl, the command line tool used to manage your cluster.
2- Nodes and their Operating Systems
Kubernetes lives on Linux or Windows servers. These servers need to be patched as new vulnerabilities are discovered and fixed. One way to update servers at scale is by using a configuration management tool. Due to the ephemeral nature of the cloud, it may be easier to provision a new node with the latest updates, add it to your cluster, move the workloads over and deprecate the in-compliant node. Develop a workflow that allows for the notification of newly discovered Common Vulnerability Exposures (CVE) so your teams can safely and securely patch your nodes.
3- Kubernetes Access Control
Kubernetes comes pre-equipped with Role Based Access Control. As the name implies, RBAC is a built in access control mechanism that allows Kubernetes administrators the ability to grant granular access to the Kubernetes API server. Roles and privileges are granted at the namespace level for workload isolation, or at the cluster level for global configuration. Like other access control mechanisms, users can be added to groups and groups can be bound to permissions easing some administrative overhead. A governance stance of “Principle of Least Privilege” fits well with RBAC.
4- Workload Privileges
A primary function of Kubernetes is to schedule workloads. Each workload, at a high level, represents a process or set of processes. These processes, or Pods, can be scheduled with elevated privileges but when these privileged workloads are compromised, a hijacker is then allowed to do almost anything they want. To achieve their goal, a person with malicious intent can potentially spawn their own process within your process, take control of your node, and even your cluster.
If you are using Docker to build your container images, the process within the container runs as the root user by default. The situation becomes exacerbated as various development teams create more and more containers. A part of your governance policy should dictate which user each container is running as. When run-time policies are enabled a workload can be blocked from running if it violates that policy
5- Built-In Policies
Kubernetes ships with the ability to implement several types of types of policies. These policies are designed to help protect the various entities that Kubernetes supports. One type of policy is a Network Policy. Network Policies are similar to firewalls for your deployments. Allow, or block various workloads from communicating with each other by name, attribute, or by IP Address. Pod Security Policies focus on the workload itself and prevent the Kubernetes Pod from being deployed with elevated privileges. This can mitigate the workload privilege concerns discussed above. A third type, Audit policies, define what events from the Kubernetes API server should be logged and what type of data should be included. For use cases that are beyond the scope of these 3, we look towards another solution using Policy as Code.
Governance with Policy-as-Code
DevOps has introduced a way of reliably delivering the latest features to our users as quickly as possible. The rapid nature of DevOps stretches beyond the grasps of human intervention, requiring automated pipelines to take the place of what was once a coordinated release effort between various teams. The challenge today is no longer about managing a deployment, but about managing governance in an environment that is never endlessly changing. A solution to this challenge is to write policies as code so that automated pipelines can incorporate compliance checks in addition to their existing build and deployment processes.
Policy as code realizes the idea that governance can now be developed, packaged and shipped as with any other code base. These policies can then be integrated with existing CI/CD pipelines, providing the same “Shifting Left” rapid feedback that DevOps has proven to be invaluable, but from a security standpoint. In other words, developers can now test against policies and can immediately detect if their software is compliant well in advance of deploying to production.
Open Policy Agent
The policies built-in to Kubernetes cover several practical use cases, but what about use cases that are specific to your environment or organization? Open Policy Agent (OPA), a Cloud Native Computing Foundation project, is an open-source policy engine that covers common use cases Kubernetes does not. It also provides extensibility to build your own custom policies to fit your unique situations. Powered by a declarative language called Rego and accompanied by several command line tools, OPA can be used by developers at commit time, within CI/CD pipelines, and at Kubernetes run-time providing coverage for both pre and post deployments.
To lighten the adoption overhead, the open-source community has developed many sample policies considered as best practices that allow for searching and replacing values to fit your own needs. In the case of the misspelled URL that was mentioned earlier, an OPA policy can be written to ensure that only a specific URL is accepted. In an event of a misspelling or unapproved domain, the attempt to implement that change will be rejected on the fly.
Continuously Monitoring For Compliance
Policies are only as good as the ones that are written. What if a policy isn’t written to prevent a bad practice? What happens when a new Common Vulnerability and Exposure (CVE) is discovered? Are you impacted? These aren’t hypothetical questions but real world examples of what may occur at any time. Organizations, such as Center for Internet Security (CIS), provide consensus based bechmarks to help assess your current areas of risk, but a single analysis at one moment in time isn’t enough. A solution to continuously monitor your environment is required to maintain a highly compliant ecosystem. When dealing with regulation, continuous monitoring is a must as these matters are constantly under scrutiny.
Beyond simply notifying your team of compliance issues, audit trails, remedying the solution, and preventing non-compliance from happening are all aspects of the big picture solution. AI based options are also starting to emerge, taking the idea of self-healing infrastructure to an entire new level. When building a solution, an important consideration is to ensure all events are correlated so your teams can assess what happened, when it occurred and which components are involved.
Enforcing governance end-to-end is challenging to implement. With a multitude of moving parts and pressure to deliver as quickly as possible, governing your clusters can seem like an impossible feat. As your workloads scale, the responsibility of incorporating security can no longer be reserved for one team, but a shared responsibility for all.
Taking a zero-trust perspective can prepare you and your organization to view your workloads with a different lens. Starting from the inside out, securing your Kubernetes clusters with built-in features is a great way to protect the core of your application. To complement a growing number of workloads, writing policies as code and enforcing them at appropriate stages of your CI/CD pipelines shifts feedback left so governance scales alongside your deployments. Lastly, continuously monitoring clusters for CVEs and compliance violations provides a level of confidence and assurance that your clusters are in the state that they need to be in safe and secure.