How to Orchestrate Rolling Upgrades With A Custom Workflow Using Cloudify and TOSCA

One of the differentiators in the marketplace for Cloudify is its ability to automate complex post deployment tasks. One such task is a rolling upgrade of software in a web server content. This is a multi-step process that typically involves a coordinated dance between the loadbalancer and the web servers, along with the installation of new content and possibly restarting services. This post is about a sample implementation of a workflow to automate the process of a zero downtime upgrade of such load balanced resources.



Cloudify – Intelligent Cloud Orchestration Done Better. Download Free.  Go


The Manual Process

It is always a useful (probably essential) exercise in any kind of automation to precisely define the manual steps necessary to accomplish the end goal, and only then attempt automation. In order to have a more precise description of the process, it is helpful to target a specific technology stack rather than deal in generalizations. In this simplified walkthrough, we will use our old friend the Nodecellar demo, the version that includes an HAProxy front end.

To perform an upgrade, we’ll need to follow this basic algorithm:

  • Identify the nodes that we want to upgrade.
  • For each node
    • Remove the node from the load balancer
    • Upgrade the node, possibly restarting
    • Add the node to the load balancer

For this example we’ll add a few additional requirements (that don’t alter the basic algorithm):

  • Add a version identifier to the process. If the version isn’t newer than the already deployed s/w, ingnore it.
  • During the upgrade process, have each node save the previous version to permit rollback.
  • Add a parameter to the upgrade, “percentage”, that only upgrades a portion of the candidate nodes (for canary-style deployments).
  • Add a parameter to the upgrade, “delay”, that will cause a pause between upgrades.

A Cloudify/TOSCA Oriented Design

TOSCA models deployments as graphs of nodes, which each node being a component that you might desire to orchestrate. As such, things such as load balancers, web servers, network components ( routers, networks, subnets), security constructs, applications, or really anything else you can think of. Node types can be created as needed; there is no fixed set of types. It also defines arbitrary relationships between the nodes, including dependency relationships. It is this graph of nodes that an “install” workflow traverses, orders, and uses to locate and execute related code (VM creation, for example). Our model needs a new workflow, we’ll call it webapp_upgrade. Workflows have total visibility over the models they operate on. In this case, we need to operate on a very simple model, as described above. The workflow will implement the algorithm previously described. Let’s reconsider it with a TOSCA frame of mind:

  • Identify the nodes we want to upgrade. TOSCA (and Cloudify naturally) defines a concept of interface. An interface defines operations that can be executed on nodes. In our case, we will be upgrading applications on web servers, so we’ll define a new type called nodejs.nodes.NodeJSApplicationModule. Our new workflow will look for these nodes. But rather than hard code the node type name in the workflow, we’ll define a new interface that the workflow can look for: webapp_upgrade. This interface will define two operations to support the workflow: upgrade and rollback.
  • For each node
    • Remove the node from the load balancer. The existence of a load balancer is a given for this workflow. This implies that there will be a connectedto_ relationship from the apps to the load balancer. So in order to remove the node from the load balancer, we’ll just execute the operation associated with that relationship that disconnects a balancer target. Normally this code is only executed by the install process; but we can reuse it to make the workflow less coupled to the implementation.
    • Upgrade the node, possibly restarting. This will be done by the code that will be associated with the new interface we’ve described. Recall that the upgrade request may be ignored based on the version. Since we’ll be using the interface, the workflow will have no dependency on the particular algorithm that an implementer might use. One such detail is whether the web server needs to be restarted, and details about the version id scheme.
    • Add the node to the load balancer. Here we’ll resuse the same concept from the “remove node” step. We’ll execute the operation that establishes a link between the load balancer, thereby re-adding it.

A Cloudify Sample Blueprint Implementation Walkthrough

A sample implementation of the design is here. For the sake of simplicity, it is implemented entirely using the script plugin. Creating a Cloudify plugin from the code would make it more reusable, if that were needed. The sample runs on Openstack.

The Model

The model/blueprint is in the file openstack-blueprint.yaml. Among the node definitions in the node_templates section are the two nodes of most interest here: haproxy and nodejs_app. The haproxy node represents the load balancer we’ll need to update, and nodejs_app is the application to be updated. Note that nodejs_app has a relationship (app_connected_to_haproxy) that connects it to the haproxy node. This is the relationship that the workflow will manipulate. The nodejs_app node has a type of nodejs.nodes.NodeJSApplicationModule. This type is defined in the types/nodejs.yaml file. Here you can see it derives from a type named upgradeable_webapp, and finally upgradeable_webapp implements the interface webapp_upgrade. This is the interface discussed earlier that our new workflow will look for.

  upgradeable_webapp:
    derived_from: cloudify.nodes.Root
    interfaces:
      webapp_upgrade:
        upgrade: {}
        rollback: {}

Back in the nodejs.nodes.NodeJSApplicationModule, the implementation for the interface is defined. We’ll go through the implementation later.

webapp_upgrade:
  upgrade:
    implementation: "scripts/nodejs/upgrade.py"
    inputs:
      version:
        description: version upgrading to (float)
  rollback:
    implementation: "scripts/nodejs/rollback.py"

The Workflow Definition

The workflow is also defined in the openstack-blueprint.yaml file at the bottom:

workflows:
  webapp_upgrade:
    mapping: scripts/upgrade_workflow.py
    parameters:
      version:
        description: version of upgraded app.  An float for easy comparison
      url:
        description: location of app
      pause:
        description: number of seconds to pause between each instance upgrade
        default: 0
      percentage:
        description: optional to perform a/b upgrade
        default: 100
      lb_relationship:
        description: the relationship name for the load balancer connection
        default: cloudify.relationships.connected_to

The mapping key points to the implementation in the scripts/upgrade_workflow.py file. Here we find the algorithm implementation. The main loop looks at all node instances, and looks for those derived from the specified interface (cloudify.relationships.connected_to).

The Interface Definition and Implementation

In order to make the upgrade process generic, a means of abstracting the upgrading and rollback processes on the target node(s). In TOSCA, this is done by defining an interface. The interface here will have two operations (as mentioned earlier): update and rollback.

upgradeable_webapp:
  derived_from: cloudify.nodes.Root
    interfaces:
      webapp_upgrade:
        upgrade: {}
        rollback: {}

The interface is defined inside a base type (in blueprint/types/upgrade.yaml) that concrete implementations (e.g. nodejs.nodes.NodeJSApplicationModule) derive from. The example implementation is for a very specific Nodejs scenario, and simplistic versioning scheme that also involves a file naming convention to identify versions. The upgrade operation is passed the version number and url for the new webapp. If the webapp is already (or beyond) the specified version, the upgrade request is ignored, otherwise it is honored. In addition to upgrading, it saves away the current version for potential future rollbacks. The rollback operation simply takes the previously saved version and restores it. There is only one level of rollback supported in the example.

The Workflow Implementation

The workflow implementation is in the scripts/upgrade_workflow.py file. The algorithm has been covered above, but a few points of potential interest for aspiring workflow writers are worth covering.

Identifying Target Nodes for Upgrade

A key part of the algorithm is figuring out which nodes will be actually upgraded (their upgrade operation called). In the get_targets function, a simple loop discovers the target nodes by looping through the instances:

  for node in ctx.nodes:
    if("webapp_upgrade.rollback" in node.operations):
      for instance in node.instances:
        targets.append(instance)
  return targets

Note that workflows have visibility of the entire deployment and all of it’s instances. Contrast this with plugins, which are typically supplied with visibility only on a specific node or node relationship pair. The logic is a typical nested loop that looks for the “rollback” operation in order to identify the interface.

Processing The Instances

Once the instances have been identified, the main algorithm kicks in. The first step is to remove target instances based on the supplied percentage parameter. Then each node is upgraded by first removing the node from the load balancer by exploiting the TOSCA relationship and the cloudify.interfaces.relationship_lifecycle.unlink interface operation:

    for rel in instance.relationships:
      if(rel.relationship.is_derived_from(p['lb_interface'])):
        ctx.logger.info("unlinking {}n".format(instance.id))
        ret=rel.execute_target_operation("cloudify.interfaces.relationship_lifecycle.unlink")
        break

As mentioned previously, all of the deployment is visible via the workflow context, and that includes relationships. Next the node is actually upgraded by calling the upgrade operation:

instance.execute_operation("webapp_upgrade.upgrade",kwargs={"version":p["version"],"url":p["url"]})

Note the version being passed from the workflow parameters to the actual operation. After the upgrade of the instance, it is restored to the load balancer by executing the establish operation from the relationship_lifecycle interface.

Rolling Back

The logic for rolling back instances is interleaved in the upgrade logic because the process is virtually identical, except that rollback is called instead of upgrade.

Conclusion

Many discussions about Cloudify revolve around how it can deploy onto multiple cloud platforms in a standards based manner, and perform automated operations such as auto-healing and auto-scaling. Often the ability of the platform to execute complex custom workflows on previously deployed systems is overlooked. This post demonstrated the implementation of a generic framework for rolling upgrades, and a specific simple implementation that updates a webapp running on NodeJS. The example is easily extensible for other load balanced resources, and illustrates the power of custom workflow capability of Cloudify. The code is available on github. As always, comments are welcome.

comments

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    Back to top