Enforcing Policy for Self-Service Environments with Cloudify and OPA

The Open Policy Agent is emerging as a standard framework for policy decisions in cloud-native environments. Running OPA on Kubernetes is a common method to provide Kubernetes admission control. OPA has become so popular that even Terraform recently announced beta integration of OPA support in Terraform Cloud. Leveraging a standard policy framework enables an organization to speak a common policy language across teams, leverage a consistent policy syntax, and expand the reach of policy enforcement within an environment.

In this article, I will walk you through writing a basic OPA policy to evaluate requests for new Kubernetes clusters. You will see how the Open Policy Agent REST API can respond to requests and provide policy decisions. After implementing and testing the policy, you will see how Cloudify can orchestrate the actual deployment of the cluster if the policy decision passes.

You can follow along with this article by cloning the repository from our community GitHub.

Let’s get started!

Policy for Environment Guardrails

Policy is frequently discussed in terms of security value. However, policy can represent much more. For example, using a common policy framework enables both security teams and architecture teams to express constraints for an environment. A security team will be concerned with misconfigurations that can leave an organization vulnerable, such as exposed ports or publicly available resources.  Other teams, such as an architecture team, will be concerned that proper sizing, capacity, and scaling guidelines are followed to avoid availability problems.

Let’s consider a simple example: providing a product team with the ability to deploy their own Kubernetes cluster. Providing self-service for these requests enables better developer productivity. On the other hand, we still need to enforce guardrails around the environments that users are allowed to deploy.

For this example, we can describe policy around three dimensions of a self-service Kubernetes cluster:

  • Sizing: Development clusters should be on smaller instances and limited in size (between 1 and 3), while production clusters should be on larger instances with a minimum amount of capacity (between 5 and 10 instances).
  • Overall capacity: We want to limit the maximum number of Kubernetes clusters that are deployed in our organization.
  • Product: Only certain internal products are allowed to deploy Kubernetes clusters. We want to maintain a list of products that are “Kubernetes-ready” and avoid proliferation of unnecessary clusters.

You can express all of these constraints using OPA and its Rego policy syntax. Integrating OPA and Cloudify allows for powerful policy decisions to be enforced against complex, self-service environments. The final implementation will enable an end-to-end self-service workflow to deploy a Kubernetes cluster:

OPA Installation

Like many modern tools, OPA is written in Go and can be deployed as a statically-compiled binary. This makes it very simple to install and update. The official documentation describes the process for each operating system. On Linux, this simply involves downloading the binary and moving it into a location within your PATH:

# Download OPA
$ curl -L -o opa https://openpolicyagent.org/downloads/v0.45.0/opa_linux_amd64_static
  % Total	% Received % Xferd  Average Speed   Time	Time 	Time  Current
                             	Dload  Upload   Total   Spent	Left  Speed
100	87  100	87	0 	0   1101  	0 --:--:-- --:--:-- --:--:--  1101
  0 	0	0 	0	0 	0  	0  	0 --:--:-- --:--:-- --:--:-- 	0
100 35.4M  100 35.4M	0 	0  10.7M  	0  0:00:03  0:00:03 --:--:-- 16.6M
# Make the OPA binary executable
$ chmod +x opa
# Move OPA into a location on the PATH
$ sudo mv opa /bin/
[sudo] password for acritelli:
# Confirm that OPA works
$ /bin/opa version
Version: 0.45.0
Build Commit: 523c285bcc417b2ec8a26b0a248407b1e840d488
Build Timestamp: 2022-10-07T18:38:08Z
Build Hostname: 25afff3277c2
Go Version: go1.19.2
Platform: linux/amd64
WebAssembly: unavailable

Policy Implementation

Implementing the Kubernetes cluster policy that we described involves three pieces of information:

  • Policy: The actual rules that a request will be evaluated against.
  • Data: OPA allows you to import data (such as a JSON file) and use the imported data within policy evaluations. This allows you to write policies without hardcoding parameters. For example: our policy will use data to represent the appropriate number of Kubernetes nodes for different instance sizes.
  • Input: The actual request that is sent to OPA. This will be provided to the OPA API for evaluation against our policy.

The filesystem layout for the policy/ directory contains the files needed for each piece of information, except for the input (which will be sent as part of the HTTP request to OPA). The compute/kubernetes/data.json file contains data that OPA will use in its policy decision, and the main.rego file contains the actual policy:

$ tree policy/
├── compute
│   └── kubernetes
│   	└── data.json
└── main.rego
2 directories, 2 files

Before you begin looking at the policy, let’s take a look at the data that you can use to support policy decisions. The JSON blob below contains several pieces of information that you can use in a policy decision about a new Kubernetes cluster:

  • current_clusters defines the current number of Kubernetes clusters in the environment. Because OPA can push and pull data, this can be updated by external automation, such as a script that regularly polls the environment to determine the number of clusters that are deployed.
  • max_clusters defines the maximum number of Kubernetes clusters that the organization wants deployed.
  • approved_projects provides a list of internal projects that are allowed to use Kubernetes clusters.
  • approved_sizing contains sizing requirements for clusters that use different sized nodes. For example: a t3.medium cluster should only be used for test projects. Therefore, the maximum number of instances is limited to 3.

OPA can support very complex policy decisions, and the parameters used in this article are only for demonstration purposes. These values are provided in the compute/kubernetes/data.json file: 

 "current_clusters": 14,
 "max_clusters": 15,
 "approved_projects": [ "payment-processing", "user-backend", "notifications-backend" ],
 "approved_sizing": {
   "t3.medium": {
     "min_instances": 1,
     "max_instances": 3
   "t3.large": {
     "min_instances": 5,
     "max_instances": 10

Next, you can take a look at each section of the policy in main.rego. The first section includes a mandatory “package” statement” and several imports. The first two imports allow you to use a set of new keywords from the latest version of the Rego query language. The final import statement pulls in data that the policy can use in its decision making. Notice that “compute.kubernetes” matches the filesystem path (compute/kubernetes) where the data.json file is located:

package kubernetes.capacity
import future.keywords.in
import future.keywords.contains
import data.compute.kubernetes

The next section of the policy defines three rules that map to the previously described organizational requirements:

  • allow_sizing determines if the number of instances in the cluster falls within a set of allowed parameters.
  • allow_capacity determines if adding an additional cluster would exceed the number of Kubernetes clusters permitted by the organization.
  • allow_project determines if the internal project is allowed to have a Kubernetes cluster.

The final “allow” rule combines all three policies together. For a cluster to be deployed, all three policy checks must pass successfully:

allow_sizing {
 requested_instances := input.num_instances
 requested_instances >= kubernetes.approved_sizing[input.instance_size].min_instances
 requested_instances <= kubernetes.approved_sizing[input.instance_size].max_instances
allow_capacity {
 requested_instances := input.num_instances
 (kubernetes.current_clusters + 1) <= kubernetes.max_clusters
allow_project {
 input.project in kubernetes.approved_projects
allow {

Simple allow or deny decisions are helpful, but it is also useful to return a message to the user if their request is denied. The final section of the policy returns a message if any of the individual rules fail. Each message rule contains two statements:

  1. A friendly message is assigned to the “msg” variable
  2. The rule is negated, which allows OPA to return the message to the user if the rule evaluates to undefined (e.g., the evaluation fails).
reason contains msg {
 msg := sprintf("Project '%v' is not allowed to deploy Kubernetes clusters", [input.project])
 not allow_project
reason contains msg {
 msg := "Insufficient capacity available for additional Kubernetes clusters"
 not allow_capacity
reason contains msg {
 min_instances := kubernetes.approved_sizing[input.instance_size].min_instances
 max_instances := kubernetes.approved_sizing[input.instance_size].max_instances
 msg := sprintf("Must use between %v and %v instances for cluster with node size %v", [min_instances, max_instances, input.instance_size])
 not allow_sizing

Policy Testing

Once the policy has been clearly defined, you can start the OPA server from within the policy directory and it will listen to requests on port 8181:

$ cd policy
$ opa run -s -b ./
{"addrs":[":8181"],"diagnostic-addrs":[],"level":"info","msg":"Initializing server.","time":"2022-10-17T15:15:31-04:00"}

You can now test out requests using an HTTP client. The request must have a JSON-encoded body with the appropriate inputs specified. First, let’s test a request that should succeed. The request below is for a Kubernetes cluster with five t3.large instances for the user-backend project. The request is sent to the /v1/data/kubernetes/capacity endpoint, which matches the package kubernetes.capacity statement from the policy. Sending a cURL request with this input results in an allow: true evaluation:

# A sample request body
$ cat /tmp/test_input.json
  "input": {
	"instance_size": "t3.large",
	"num_instances": 5,
	"project": "user-backend"
# Send a request to the server and pass the result to JQ for printing
$ curl -s -d @/tmp/test_input.json http://localhost:8181/v1/data/kubernetes/capacity | jq
  "result": {
	"allow": true,
	"allow_capacity": true,
	"allow_project": true,
	"allow_sizing": true,
	"reason": []

Next, you can confirm that requests violating the policies are denied. The example request below is for a Kubernetes cluster with 15 t3.large instances for the user-backend project. A request for 15 instances violates the max_instances constraint within the policy definition, so the request fails and a message is returned to the requestor.

Notice that the response itself does not contain any type of “deny” key, but the “allow” key is missing. When an OPA rule is false, it evaluates to “undefined” and is omitted from the response. The absence of “allow” indicates that the request is denied.

# A sample request body
$ cat /tmp/test_input.json
  "input": {
	"instance_size": "t3.large",
	"num_instances": 15,
	"project": "user-backend"
# Send a request to the server and pass the result to JQ for printing
$ curl -s -d @/tmp/test_input.json http://localhost:8181/v1/data/kubernetes/capacity | jq
  "result": {
	"allow_capacity": true,
	"allow_project": true,
	"reason": [
  	"Must use between 5 and 10 instances for cluster with node size t3.large"

Compliant, Self-Service Environments

OPA is a powerful tool for evaluating requests and making policy decisions. However, your requests have simply been examples sent via cURL. How can you extend this example to provide policy dispositions for self-service environments?

The Cloudify REST plugin allows you to integrate calls to any HTTP-based API within environment-as-a-service workflows. By using the REST plugin to send requests to OPA, you can integrate policy enforcement into your self-service environment blueprints. Requests that meet policy constraints will be permitted, while other requests will fail.

The repository contains a blueprint that implements this workflow for a Terraform-based EKS cluster. You can install the blueprint in your Cloudify manager from the command-line to follow along with the examples:

$ cfy blueprint upload -b OPA-Example https://github.com/cloudify-community/opa-example/archive/refs/heads/main.zip
Publishing blueprint archive https://github.com/cloudify-community/opa-example/archive/refs/heads/main.zip...
Blueprint `OPA-Example` upload started.
2022-10-19 13:13:31.646  CFY <None> Starting 'upload_blueprint' workflow execution
2022-10-19 13:13:32.000  LOG <None> INFO: Blueprint archive uploaded. Extracting...
2022-10-19 13:13:32.075  LOG <None> INFO: Blueprint archive extracted. Parsing...
2022-10-19 13:13:34.147  LOG <None> INFO: Blueprint parsed. Updating DB with blueprint plan.
2022-10-19 13:13:34.322  CFY <None> 'upload_blueprint' workflow execution succeeded
Blueprint uploaded. The blueprint's id is OPA-Example

First, let’s attempt to deploy a Kubernetes cluster that violates our policy constraints from the Cloudify UI. You can select the catalog item and fill out the form to provide inputs to the deployment. You can purposely select a non-compliant cluster size of 15 instances. Once the deployment is triggered, it quickly fails after the call to OPA due to the policy violation:

Next, let’s try to deploy a policy compliant Kubernetes cluster from the CLI. This time, the sizing configuration matches the requirement of the policy. First, you can define the inputs to the deployment. A cluster with two t3.medium instances will be allowed by the policy:

# Inputs to the deployment
$ cat /tmp/inputs.yaml
eks_cluster_name: opademo
opa_endpoint: 4082-64-201-235-203.ngrok.io
instance_size: t3.medium
num_instances: 2

Next, you can create the environment and trigger the install workflow to start the environment deployment. Unlike the first deployment that failed immediately due to a policy violation, this deployment will take several minutes to create. After the OPA policy check succeeds, the provisioning process will proceed with deploying the Terraform module:

# Create the environment
$ cfy deployment create -b OPA-Example -i /tmp/inputs.yaml demo-cluster
Creating new deployment from blueprint OPA-Example...
Deployment `demo-cluster` created. The deployment's id is demo-cluster
# Trigger an environment installation
$ cfy executions start -d demo-cluster install
Executing workflow `install` on deployment `demo-cluster` [timeout=900 seconds]
2022-10-19 15:08:37.393  CFY <demo-cluster> Starting 'install' workflow execution

You can also log into the UI and check the status of the deployment. Notice that all steps in the Execution Task Graph, including the policy evaluation, have completed successfully:

Once the deployment process finishes, you can obtain the deployment’s exposed capabilities. This deployment exposes the cluster ID and cluster endpoint:

$ cfy deployment capabilities --json demo-cluster | jq
  "cluster_id": {
	"value": "opademo",
	"description": null
  "cluster_endpoint": {
	"value": "https://8B911D9830A67DD789F8210A2781F507.gr7.us-west-2.eks.amazonaws.com",
	"description": null

Final Thoughts

Adoption of the Open Policy Agent continues to grow as organizations look for easier and more flexible ways to express policy that are decoupled from individual applications. In this article, you saw how an OPA policy can be used to enforce guardrails around self-service environment requests. You also saw how Cloudify is able to orchestrate the end-to-end workflow, from policy evaluation through infrastructure deployment, within a self-service catalog item. Enabling your developers to deploy their own resources while still placing policy around their requests enables you to improve product team velocity without making sacrifices on security or architectural constraints.


    Leave a Reply

    Your email address will not be published. Required fields are marked *

    Back to top