SD-WAN over Satellite – a Multi-Operator, Multi-Vendor, Multi-Access solution
Cloudify held an ‘SD-WAN over Satellite Access’ POC together with Intelsat, CMC Networks, 128T, Nokia Nuage, Kontron and GDX, at the last MEF 19 event. This POC received the 2019 MEF 3.0 service implementation award.
The motivation was to show how to connect areas with problematic existing Internet connectivity or no connectivity at all utilizing a Satellite connection. This is a real necessity in remote rural areas and remote gas or mining sites.
Interoperability between SD-WAN equipment vendors, underlay networks, telco, mobile and satellite operators would unlock the true potential and intent of SD-WAN as an overlay service, increase adoption, speed-up revenue realization and improve customer experience.
The implementation utilized the MEF LSO Sonata, Legato and Presto APIs, delivering MEF 70 SD-WAN service as a single orchestrated service with multiple components.
There have been a few challenges for making it happen:
- Multi-operator solution supporting many carriers. For this we use the LSO Sonata APIs to agree on the business terms of the connection.
- Multi-vendor SD-WAN solution. Each carrier can have a different SD-WAN solution for his domain.
- Multi-access solution supporting both a terrestrial connectivity and a satellite connectivity. Additional access types can be added as necessary.
- Building a SD-WAN overlay on two different access type underlays, MPLS and Satellite.
- Last but not least – the challenge of getting all the vendors involved to act together
The Proposed Solution
The solution includes a BSS portal offered to enterprise customers, where they could order the service. The LSO Sonata API guarantees the definition of the business terms and SLAs required between carriers. Then Cloudify Orchestrator is being called to provision the service. Provisioning of the service includes talking to the 128T conductor to establish and configure a 128T SD-WAN router for the terrestrial side. For the satellite side Cloudify talks to Nokia Nuage and creates a NSG (network service gateway). The Nuage NSG is configured to talk over a satellite underlay access. Intelsat provided the satellite access.Both SD-WAN gateways (SD-WAN routers) are instantiated on-demand at customer endpoints with service termination in a uCPE. The uCPE hardware is provided by Kontron. Each uCPE SD-WAN router is configured to talk with another SD-WAN router from the same type, which is located on AWS East. The peering between the terrestrial side and the satellite side is done at AWS EAST – between the two SD-WAN routers (the 128T and Nuage).
Then after each side is created, the terrestrial side and the satellite side, which requires a lot of configuration on its own, service routes are defined and an IP connection is established between the two endpoints.
The termination process is the same, deleting the service routes and then the gateways.
Cloudify handles situations where one of the gateways fails and needs “healing” as well as supports scaling in cases where SLAs degrade. I expand on this in the Closed Feedback Loop section.
MEF Reference APIs
MEF reference APIs involved in this process:
- Sonata – Defines the business terms between operators to create such connections that span multi-operators. It defines the business contract, pricing, SLAs and penalties in case of breaches.
- Legato API – Each OSS/BSS operator system calls Cloudify APIs utilizing Legato. Legato defines to the orchestrator what services needs to be instantiated and their properties.
- Presto API – Defines a common abstraction layer to talk to all SD-WAN vendors in a common language. Each SD-WAN vendors should provide a plugin that translates the Presto common language to its own primitives and commands.
Cloudify is used as the orchestrator. Cloudify uses a TOSCA blueprint to define the service and its sub-components. Cloudify provisions the network gateways/routers on both sides and handles the life cycle operations (LCM) required at each step.
Figure 1 shows a site map, where we can see the location of each of the uCPEs and SD-WAN routers provisioned. Clicking on each one of the icons will get us to the deployment itself, where we can see the exact deployment workflow execution graph, as shown in Figure 2. In this example we see the workflow progress execution (green means completed) of the 128T SD-WAN provisioning part and its life cycle operations (LCM).
Life Cycle Management (LCM)
Each orchestrator worth its weight, should support life cycle operations like init, configure, start, stop, delete, etc. This is needed to provision and configure the right service components. For example, in the 128T router you need to add a new device, create a router, a node, a device interface and configure them. Then you have to create the service routes to connect the new router with its peering router.
There are dependencies and order between the components. Cloudify DSL (domain specific language) is TOSCA and TOSCA supports nodes and relationships for dependency creation and life cycle management.
Moreover, because this solution spans multiple access types and multiple SD-WAN vendors you can separate it to multiple blueprints. The master blueprint defines a single service with multiple components. Each component is a blueprint of its own, one for the satellite connection and one for the terrestrial connection. Each component blueprint has its own TOSCA nodes and life cycle operations and can be executed separately. One can pass parameters between blueprints and get back status results on their execution. The service master blueprint invokes the components blueprints. Using this model, one can define even complex topologies and update them on the fly.
TOSCA based Intent orchestration
TOSCA nodes and relationships are suitable for defining topologies. TOSCA is capable of coping even with complex topologies, in our case a SD-WAN overlay topology with a terrestrial side and a satellite side as described above.
The basis of TOSCA is a TOSCA node with life cycle events, for example a SD-WAN gateway/router. Then it defines relationships which could be one of two types –
a) Contained in – For example, a device is contained in a TOSCA node.
b) Connected to – For example, a SD-WAN router connected to another SD-WAN router
Relationships define provisioning dependency order, and the life cycle operations both on the source and target of the relationship. Run-time attributes can be shared in real time, on a relationship, for example an instantiated IP addresses of the target SD-WAN router can be shared with a source SD-WAN router while establishing the connection.
Figure 3 shows an example of a TOSCA node, including its properties and life cycle events. This TOSCA node represents the 128T device network interface.
One of the most valuable TOSCA capabilities is its ability to abstract low level items. TOSCA is a declarative language where you say WHAT you want to do and not HOW you do it. That is the basis for intent based orchestration. In our case we can have multiple SD-WAN vendors but all life cycle operations are the same. This goes well with the MEF LSO Presto abstraction to map all SD-WAN vendors into a common interface definition.
Universal CPE (uCPE) systems provide a NFVI platform that runs virtualized network functions (VNFs). In our case, both SD-WAN gateways (SD-WAN routers) are instantiated on-demand at the uCPE. The SDN-WAN gateway is a VNF that has special characteristics and requirements like other VNFs, e.g. vFW, vLB, vDPI, etc. VNFs that have special requirements like SR-IOV, DPDK, NUMA, CPU-Pinning can utilize Intel EPA platforms, which support hardware acceleration capabilities, for faster execution.
The uCPE form factor is important and vendors need to make an effort to minimize their VNF footprint. Many are moving to containers, CNFs. Using Cloudify and Intel EPA / QuickAssist vendors can match their CNF requirements with Kubernetes node capabilities for significant performance gains, as described here.
Closed Feedback Loop
One important role of orchestration is a closed feedback loop. The orchestrator can send collected KPIs from monitored nodes and connections to a SA (service assurance) component and this SA can decide with or without the help of the OSS/BSS system whether to trigger a specific action, called a workflow in the orchestration language, once the SA sends the trigger.
In our example we utilize Infovista as the Orchestrator and in case the SLA degrades below the contract threshold it can instruct Cloudify to scale out and add additional capacity, or change the connection configuration.
In the POC case Infovista can monitor the KPIs obtained at the SD-WAN overlay layer. In case there is something wrong, Infovista can drill down to the underlay layer KPI’s to better understand the root cause, and then instruct Cloudify how to remediate, known in the orchestration terminology as triggering a healing or a scaling workflow.
The full POC video could be found here.
During the POC demonstration people added the following questions:
- I am a carrier; how can you add me to this project?
Today the POC shows how to establish a connection between two carriers. We plan in the future to add additional carriers and it could serve as the main hub where multiple carriers can register and be added to the system. We can turn to be the virtual SD-WAN HUB for multiple carriers and multiple SD-WAN vendors.
- I am a SD-WAN vendor, how can I be a part of this project?
Adding additional SD-WAN vendors is easy, especially if they support the LSO Presto interface which defines a common language to talk with all SD-WAN vendors. Moreover, utilizing TOSCA orchestration provides an intent based abstraction and multiple SD-WAN vendors could participate. Life cycle operations (LCM) for all vendors are the same.
- As a carrier, when dealing with other carriers I want to have a distributed ledger for my business terms, SLAs and SLA breaches.
Another MEF POCs (“Standardization of Blockchain Billing and Settlement Utilizing MEF 3.0 Framework”, “Automated Inter-Carrier Credit Ratings Using Blockchain”) already implemented a distributed ledger for the business terms, we can adopt this solution here too.