To “SRE” Or Not To “SRE”? That is (No Longer) the Question

The DevOps world is going through rapid changes and rightfully so. In a world where everything is cloud or cloud-native-based, scale is becoming one of the most critical parameters for business efficiency. 

In fact, velocity is no longer measured by the number of code lines a developer produces, but rather by the time it takes the team to release a feature. Focusing on a feature release rather than the number of code lines forces businesses to switch to a more sophisticated delivery mechanism. That’s where SRE comes into play.

So What is an SRE?

A “Site Reliability Engineer” (or SRE) is a supporting function for DevOps. This function is responsible for minimizing the potential number of incidents (human errors and misconfigs), reliability, and scalability of code in production. An SRE, if you’d like, is a mediating function between DevOps and Developers. Going back to the KPI of “feature time to deployment”, it is the SRE’s role to ensure that such features are implemented as smoothly and as quickly as possible, using as much automation as possible. 

Make Developers Great Again

Let’s focus on the developers and zoom in on Kubernetes for the sake of this discussion. 

According to a recent CNCF survey, nearly 5.6 million developers are using Kubernetes. This means that Kubernetes and containers are now in the mass adoption stage. We also see those same developers struggle with the complexity of Kubernetes. As a recent Red Hat survey reports, a staggering 95% of the security and data breaches in Kubernetes result from human error! 

Now, let’s dissect this from a different perspective – efficiency. According to Garden.io, cloud developers spend 11% of their time on average actually writing code, with 14-16 hours spent weekly maintaining internal tooling, setting up dev environments, debugging pipelines, and waiting for builds or test results (see here). 

In short, as I hope these two examples illustrated, developers’ work is highly inefficient in today’s technological world. That’s where the SRE team should make an impact, but how? The only way, in my opinion, to handle the complexity of multiple tools, clouds, and technologies is a platform that unifies all of those elements under a single pane of glass.

Environment as a Service to the Rescue

By this stage, I’m hoping I was able to establish the problem that SREs are tasked with solving. 

Now let’s discuss some possible solutions. 

One alternative for SREs to consider is to build a platform that’ll automate tools and clouds in-house. While valid, we see a few critical issues with this approach. First and foremost, the development of such a platform takes a long time (12 months and longer). 

Second, there are some blind spots that aren’t necessarily visible from the get-go. We talked with a few SRE groups that realized, relatively late in the development process, that creating such a platform is only the first step. Updating existing environments, changing configuration parameters, etc., is an entirely different effort, which most in-house platforms are not capable of dealing with. This results in… manual labor and scripting all over again. 

An alternative approach would be to leverage an existing platform, ideally open-sourced and extendable enough to allow for specific tailoring and add-ons to be easily added. One example for such tool is Cloudify, an open source, multi-cloud automation platform that packages infrastructure, networking, and existing automation tools into certified environments, exposed to developers through a service catalog. 

In a platform like Cloudify, all teams join forces and come together. DevOps teams will use the platform to design and templatize cloud environments, Developers will consume those environments either directly from the platform’s catalog or from the many integrations Cloudify provides to CI/CD and ITSM tools, and SRE teams will serve as the “connecting” function, guiding DevOps on environment sizing, TTL, or production requirements, receiving requirements from developers around environments needs and issues.

To conclude, regardless of whether your business decides to develop an in-house automation platform or leverage an existing one, there’s critical importance to creating the SRE team and including that team in decisions that simplify the organizational developer’s experience. 

comments

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    Back to top