Let’s simplify the developer experience and scale DevOps workflows without compromising the security of multi-Kubernetes environments.
Organizations are eagerly adopting containers and Kubernetes, investing in cloud-native to foster innovation and growth. According to the CNCF and Slashdata, nearly 5.6 million developers use Kubernetes. That’s 31% of all backend developers.
We all know that Kubernetes is a great container management platform. It’s flexible, pluggable and can do most anything; from managing application workloads across multiple regions and clouds, to networking, to edge deployments, it seems Kubernetes can do it all.
But that flexibility comes at a big cost. All those options make the DevOps experience oppressively complex. Security is another challenge. A recent Red Hat survey reports that 95% of the security and data breaches in Kubernetes result from Human error. Almost that many respondents experienced at least one security incident in their Kubernetes environments in the last year, sometimes leading to revenue or customer loss.
We need a self-service developer platform
Gartner’s How to Scale DevOps Workflows in Multicluster Kubernetes Environments says the solution is “to streamline DevOps workflows, I&O leaders must build platform teams, automate cluster life cycle management, enhance developer self-service and adopt GitOps practices.” Interestingly, this approach is consistent with findings in last year’s CNCF Kubernetes Benchmarking Study.
When Gartner says “Automate” that’s just another way of saying “developer self-service.” Enabling developer self-service in Kubernetes boosts developer productivity. Paired with a logical cloud-native architecture, engineers can self-provision, test, secure and deploy their apps without having to wait on ops to provision resources for them. Developer self-service means developers needn’t be experts in the entire deployment toolchain. When it comes to repetitive tasks like spinning up new features or preview environments, enabling developer self-service saves teams time. Without it, you face slowing down the development lifecycle, with increases in change failure rates, bottlenecks, and key person dependencies.
What we learned enabling developer self-service in a highly secured enterprise environment
I’ve been supporting a large financial services organization that moved to EKS. Understandably, they took extreme security measures to protect their infrastructure. They also realized that the only way to make life sane for developers in this environment while maintaining the pace of innovation they required was to provide a self-service environment for developers to abstract away the underlying Kubernetes and cloud infrastructure chaos.
Step 1: Take an opinionated approach to development infrastructure
The first step we took was to take an opinionated approach to the development environment. To get there, they:
- Used a single provider (AWS) to avoid getting into multi-cloud complexities out of the gate
- Mandated that apps would be written as cloud-native on Lambda only
- Minimized the number of Kubernetes clusters (EKS) to one for production and a few others for development, using namespace to separate between projects
Step 2: Establish a dedicated platform engineering team
Next, they made the platform engineering team (DevOps) responsible for delivering a self-service environment. That meant providing a self-service EKS cluster certified to fit their security regulation. Similarly, the team provides other certified services such as DBaaS and Kafka.
The diagram describes the actions and benefits of having a dedicated platform engineering team to deliver a self-service environment:
Step 3: Take a best-of-breed approach to deliver a self-service development platform (IDP)
The team first set out to build an IDP layer on top of Terraform, EKS, and the CI/CD pipeline. They quickly realized that the amount of work needed to build all the integration points and abstraction layer abstraction is vastly larger than anticipated. So, they started looking for a shorter path to value using open source alternatives. That search led the team to Cloudify, and next, we’ll outline the architecture stack we created with open-source Cloudify to accomplish their goals.
The stack consist of the following elements:
- The platform stack
- AWS – Public cloud
- EKS + Istio – Central Application platform for development and production
- API gateway – provide northbound interface
- Apigee – southbound interface to backend services
- ELK – log aggregation and monitoring
- ServiceNow – approval and post-deployment governance
- Cloudify – as the engine that glues services together into a self-service environment
- The developer stack
- Lambda – Serverless engine
- Helm – Microservices deployment
- Backstage – development portal
- The common stack
- Jenkins – CI/CD
- Bitbucket – development repo
- Nexus – image repository
The following diagram shows how the parts work together and how ownership between the development team and the platform engineering team are segmented:
Step A: Creating a landing zone
When a developer starts a new project they need to be assigned an environment with all the roles and credentials for the resources they’ll need: a Git repo (Bitbucket) and a set of project secrets that are grouped into an environment repository, in this case.
The platform team is responsible for creating an environment repository assigned to the project. That provides a reference to all the secret configurations. Cloudify provides a hierarchical environment structure to define this, as well as the scope of each environment item. The scope can be global (shared with everyone), tenant-wide (shared with everyone in a specific business unit), or private (a specific project).
A developer git repository and build pipeline are created, based on their specific class (.Net, Java, Web, Mobile, Backend, etc.). There are a few well-defined project templates users can select from that create the entire environment stack as described in the diagram below.
This process is also responsible for creating a predefined build and promote pipeline that will be pre-integrated into the developer CI/CD environment, including all the references and parameters needed to trigger a build or promote a process.
Once the landing zone is ready, the user is notified via email with the relevant links and references to the project repo and build pipelines (in Jenkins). In this case, the team uses ServiceNow to handle the email notifications and manage the approval pending and confirmation tasks. ServiceNow provides a simple way to customize the email template, create custom flows using the Flow Designer, and integrate with other tools such as Slack.
Step B: Creating a super-simple developer experience
This is perhaps the most important part of this project. The goal was to abstract the developer from the underlying Kubernetes/Istio infrastructure as well as all the Terraform modules that were used to create the cloud services. In this way, the developer can focus on developing the microservices which get compiled and backed into a Docker image. The images can then be stored in a Nexus image repo.
Once the user is ready to deploy, all they need to do is either trigger the deploy pipeline from the CI/CD environment or use GitOps to do that implicitly upon push or merge.
The platform is now responsible for taking this image through the deployment process. This includes scanning the image, wrapping the image with the appropriate Helm template, and getting approval if needed (through ServiceNow). Once the request has been approved, the deployment process continues automatically. The new microservices are deployed under a namespace assigned to the project. The last phase is to wire these microservices into the network using Istio and an API gateway.
Step C: Turn all the infrastructure resources into a self-service, certified environment
The certified environment is delivered to developers via a catalog service. It’s basically a pre-configured stack of cloud resources designed to meet security and performance constraints. Cloudify provides a three-step approach to streamline the creation of service components:
- Importing and creating the new, self-service environment
Environments can be built from existing Terraform, AWS Cloud Formation, Kubernetes clusters, or Azure ARM components. Users can import and publish these to the Cloudify environment blueprint catalog and the software will generate a wrapper YAML blueprint that provides consistent I/O and relationship management among the resources. Users can also, of course, create composite environments from existing resources.
- Using a self-service environment
Developers can create new environments from a list of predefined certified environments via installation workflows. The installation trigger can be made through a GitOps action, CI/CD pipeline, a REST call, CLI or GUI.
- Updating an existing self-service environment
Cloudify provides a consistent update workflow across the infrastructure resources that covers updating Terraform modules, HELM charts, and Ansible playbooks. It also covers adding or removing a service from an existing stack and in-place upgrade of a resource.
The diagram below illustrates those steps. You can also watch a demo showing this in more detail.
The end result: Extremely simple developer experience without compromising on security
From a developer perspective, a platform looks like a smart pipeline that abstracts away infrastructure complexity and is resilient enough to handle continuous infrastructure updates and drifts. From a platform team perspective, a platform is a central place to govern and continuously update the infrastructure environments without breaking the developer pipeline. Cloudify is used as a platform that integrates these pieces.
It’s the developer experience, duh!
One of the biggest challenges in software engineering is finding the right balance between flexibility and simplicity. This is especially true in a highly secured and regulated environment. The main reason why previous attempts to solve this challenge have failed is that success requires a holistic approach to the entire development process, then building teams and infrastructure that will support it. In many organizations silos complicate implementation. This results in compromises that often solve one area of complexity while introducing others elsewhere.
We were able to strike the right balance for this financial services organization. On one hand, developers get an extremely simple interface that abstracts away the unnecessary details of the underlying infrastructure. At the same time, the platform team has a full degree of flexibility to deliver almost any cloud service in a matter of minutes and in the process simplify one of the most complex DevOps tasks: handling the continuous update of those environments.
The lessons from this experience could serve teams that are now establishing a platform engineering strategy to avoid failing into the mistakes of the past. Open source, once again, serves as a great place to start for solving DevOps challenges.