Transitioning to microservices has many advantages for teams building large applications, particularly those that must accelerate the pace of innovation, deployments, and time to market. Microservices also provide technology teams the opportunity to secure their applications and services better than they did with monolithic code bases.
Zero-trust security provides these teams with a scalable way to make security fool-proof while managing a growing number of microservices and greater complexity. That’s right. Although it seems counterintuitive at first, microservices allow us to secure our applications and all of their services better than we ever did with monolithic code bases. Failure to seize that opportunity will result in non-secure, exploitable, and non-compliant architectures that are only going to become more difficult to secure in the future.
Let’s understand why we need zero-trust security in microservices. We will also review a real-world zero-trust security example by leveraging the Cloud Native Computing Foundation’s Kuma project, a universal service mesh built on top of the Envoy proxy.
Security before microservices
In a monolithic application, every resource that we create can be accessed indiscriminately from every other resource via function calls because they are all part of the same code base. Typically, resources are going to be encapsulated into objects (if we use OOP) that will expose initializers and functions that we can invoke to interact with them and change their state.
For example, if we are building a marketplace application (like Amazon.com), there will be resources that identify users and the items for sale, and that generate invoices when items are sold:
Typically, this means we will have objects that we can use to either create, delete, or update these resources via function calls that can be used from anywhere in the monolithic code base. While there are ways to reduce access to certain objects and functions (i.e., with public, private, and protected access-level modifiers and package-level visibility), usually these practices are not strictly enforced by teams, and our security should not depend on them.
Security with microservices
With microservices, instead of having every resource in the same code base, we will have those resources decoupled and assigned to individual services, with each service exposing an API that can be used by another service. Instead of executing a function call to access or change the state of a resource, we can execute a network request.
By default, this doesn’t change our situation: Without proper barriers in place, every service could theoretically consume the exposed APIs of another service to change the state of every resource. But because the communication medium has changed and it is now the network, we can use technologies and patterns that operate on the network connectivity itself to set up our barriers and determine the access levels that every service should have in the big picture.
Understanding zero-trust security
To implement security rules over the network connectivity among services, we need to set up permissions, and then check those permissions on every incoming request.
For example, we may want to allow the “Invoices” and “Users” services to consume each other (an invoice is always associated with a user, and a user can have many invoices), but only allow the “Invoices” service to consume the “Items” service (since an invoice is always associated to an item), like in the following scenario:
After setting up permissions (we will explore shortly how a service mesh can be used to do this), we then need to check them. The component that will check our permissions will have to determine if the incoming requests are being sent by a service that has been allowed to consume the current service. We will implement a check somewhere along the execution path, something like this:
if (incoming_service == “items”) { deny(); } else { allow(); }
This check can be done by our services themselves or by anything else on the execution path of the requests, but ultimately it has to happen somewhere.
The biggest problem to solve before enforcing these permissions is having a reliable way to assign an identity to each service so that when we identify the services in our checks, they are who they claim to be.
Identity is essential. Without identity, there is no security. Whenever we travel and enter a new country, we show a passport that associates our persona with the document, and by doing so, we certify our identity. Likewise, our services also must present a “virtual passport” that validates their identities.
Since the concept of trust is exploitable, we must remove all forms of trust from our systems—and hence, we must implement “zero-trust” security.
In order for zero-trust to be implemented, we must assign an identity to every service instance that will be used for every outgoing request. The identity will act as the “virtual passport” for that request, confirming that the originating service is indeed who they claim to be. mTLS (Mutual transport Layer Security) can be adopted to provide both identities and encryption on the transport layer. Since every request now provides an identity that can be verified, we can then enforce the permissions checks.
The identity of a service is typically assigned as a SAN (Subject Alternative Name) of the originating TLS certificate associated with the request, as in the case of zero-trust security enabled by a Kuma service mesh, which we will explore shortly.
SAN is an extension to X.509 (a standard that is being used to create public key certificates) that allows us to assign a custom value to a certificate. In the case of zero-trust, the service name will be one of those values that is passed along with the certificate in a SAN field. When a request is being received by a service, we can then extract the SAN from the TLS certificate—and the service name from it, which is the identity of the service—and then implement the permission checks knowing that the originating service really is who it claims to be.
Now that we have explored the importance of having identities for our services and we understand how we can leverage mTLS as the “virtual passport” that is included in every request our services make, we are still left with many open topics that we need to address:
- Assigning TLS certificates and identities on every instance of every service.
- Validating the identities and checking permissions on every request.
- Rotating certificates over time to improve security and prevent impersonation.
These are very hard problems to solve because they effectively provide the backbone of our zero-trust security implementation. If not done correctly, our zero-trust security model will be flawed, and therefore insecure.
Moreover, the above tasks must be implemented for every instance of every service that our application teams are creating. In a typical organization, these service instances will include both containerized and VM-based workloads running across one or more cloud providers, perhaps even in our physical datacenter.
The biggest mistake any organization could make is asking its teams to build these features from scratch every time they create a new application. The resulting fragmentation in the security implementations will create unreliability in how the security model is implemented, making the entire system insecure.
Service mesh to the rescue
Service mesh is a pattern that implements modern service connectivity functionalities in such a way that does not require us to update our applications to take advantage of them. Service mesh is typically delivered by deploying data plane proxies next to every instance (or Pod) of our services and a control plane that is the source of truth for configuring those data plane proxies.
The service mesh pattern is based on the idea that our services should not be in charge of managing the inbound or outbound connectivity. Over time, services written in different technologies will inevitably end up having various implementations. Therefore, a fragmented way to manage that connectivity ultimately will result in unreliability. Plus, the application teams should focus on the application itself, not on managing connectivity since that should ideally be provisioned by the underlying infrastructure. For these reasons, service mesh not only gives us all sorts of service connectivity functionality out of the box, like zero-trust security, but also makes the application teams more efficient while giving the infrastructure architects complete control over the connectivity that is being generated within the organization.
Just as we didn’t ask our application teams to walk into a physical data center and manually connect the networking cables to a router/switch for L1-L3 connectivity, today we don’t want them to build their own network management software for L4-L7 connectivity. Instead, we want to use patterns like service mesh to provide that to them out of the box.
Zero-trust security via Kuma
Kuma is an open source service mesh (first created by Kong and then donated to the CNCF) that supports multi-cluster, multi-region, and multi-cloud deployments across both Kuberenetes and virtual machines (VMs). Kuma provides more than 10 policies that we can apply to service connectivity (like zero-trust, routing, fault injection, discovery, multi-mesh, etc.) and has been engineered to scale in large distributed enterprise deployments. Kuma natively supports the Envoy proxy as its data plane proxy technology. Ease of use has been a focus of the project since day one.
With Kuma, we can deploy a service mesh that can deliver zero-trust security across both containerized and VM workloads in a single or multiple cluster setup. To do so, we need to follow these steps:
1. Download and install Kuma at kuma.io/install.
2. Start our services and start `kuma-dp`
next to them (in Kubernetes, `kuma-dp`
is automatically injected). We can follow the getting started instructions on the installation page to do this for both Kubernetes and VMs.
Then, once our control plane is running and the data plane proxies are successfully connecting to it from each instance of our services, we can execute the final step:
3. Enable the mTLS and Traffic Permission policies on our service mesh via the Mesh
and TrafficPermission
Kuma resources.
In Kuma, we can create multiple isolated virtual meshes on top of the same deployment of service mesh, which is typically used to support multiple applications and teams on the same service mesh infrastructure. To enable zero-trust security, we first need to enable mTLS on the Mesh
resource of choice by enabling the mtls
property.
In Kuma, we can decide to let the system generate its own certificate authority (CA) for the Mesh
or we can set our own root certificate and keys. The CA certificate and key will then be used to automatically provision a new TLS certificate for every data plane proxy with an identity, and it will also automatically rotate those certificates with a configurable interval of time. In Kong Mesh, we can also talk to a third-party PKI (like HashiCorp Vault) to provision a CA in Kuma.
For example, on Kubernetes, we can enable a builtin
certificate authority on the default mesh by applying the following resource via kubectl
(on VMs, we can use Kuma’s CLI kumactl
):
apiVersion: kuma.io/v1alpha1 kind: Mesh metadata: name: default spec: mtls: enabledBackend: ca-1 backends: - name: ca-1 type: builtin dpCert: rotation: expiration: 1d conf: caCert: RSAbits: 2048 expiration: 10y