Closed loop orchestration (CLO) with Cloudify
This post was originally published on Sebastian’s blog, Orchestrated Things, on June 26, 2018.
“…the autonomy to IT systems?”
When talking to customers, the idea of closed loop orchestration is popping up quite often. Some customers, especially in telco space are putting a lot of focus on it – mostly in the context of NFV. What is this closed loop orchestration and what it matters? Concept is not new. It is being used by service assurance practice for quite some time. In its most simplistic form it combines event with action to change system state. There are many service assurance systems which collect events and take predefined actions based on them. Why we bother then? Simply because IT systems are growing more complex. Very often this simple combination of event with action may not be enough. In a complex system, you need to collect many events, calculate composite metric (very often using big data analytics) and trigger action to change system state. This is where orchestration is starting to play important role.
Read the multi-layer & cross-domain service orchestration whitepaper! Download
Closed loop orchestration (CLO) purpose is to change system state in response to event (or set of events). In order to change system state – very often we need to understand its model and this is the role of proper orchestrator.
Let’s take a closer look at CLO concept. First, we need to have orchestrated objects – these are the IT system elements which state we need to observe and potentially change. The can be as simple as single host server which is running some application or more complex which are cluster of applications. It can be single virtual firewall or complete, virtual security perimeter which includes multiple virtual firewalls, load balancers and edge virtual routers. Why we talk mostly virtual in here? Because VNFs are more dynamic in its nature than PNFs and CLO is more relevant to them – however there may be the cases where CLO is relevant also to PNFs.
Next we need something, which collect metrics. What these metrics are? It depends on a system. They can be as simple as CPU or number of VPN users – but also they can be more complex and we call it then “composite metrics” as they are calculated based on multiple parameters. In a nutshell, metric is a currency in which we measure performance of a given system. What is more and more common is a big data analytics which are calculating those metrics based on historical data and complex heuristics.
Once we have a metric, then we need something that will decide what to do based on it. This element is called policy. Policy engine, observes/fetches metrics, processes it and enforces the action. Action represent some lifecycle action which will change the state of orchestrated object. Good example from NFV world can be “scale-out”/”scale-in”, “heal” or even “change placement”. Policy engine is not responsible for executing an action. It only enforces given policy and tells orchestrator what to do. It is an orchestrator which acts upon orchestrated object and implements given lifecycle action.
We can debate if metrics collection & policy engine should be part of orchestration system. There are cases where this is possible & expected but there’s no general guidance on that. I can imagine a cases where they need to be decoupled to give depth & breadth of functionality. It is relevant especially for metrics collection. If our system is very complex and requires composite metrics which are calculated based on bid data analysis – then it is better when this system is external to orchestrator. Same to the policy engine. There are many on the market and if someone is used to some policy engine – then why to push him for using different than he is used to. What is important is OPENNESS. Good orchestrator should be capable to be integrated with external metrics collection and policy engines.
Cloudify has a good record of closed loop orchestration. Few years ago, when this topic was not that common, Cloudify have implemented CLO in Cloudify Manager. System was based on Diamond agent and metrics streaming to Rieman policy engine. It was very innovative approach back then. It allowed to create dynamic systems where state is changed based on given metric. Good example is NodeCellar demo with auto-scale and auto-heal scenarios based on this engine.
Currently, CLO architecture is being reworked and very soon we expect to see modular architecture where Cloudify Manager can be easily integrated with external metrics collection and policy enforcement engines.
Looks like with good solution we’re able to give more and more autonomy to IT systems. When properly implemented, those systems can be self-adjusting to conditions of environment and human intervention can be minimalised.