Orchestrating Docker 1.12 Swarm With Cloudify

The recent release of Docker 1.12 introduced a highly upgraded version of Swarm baked into it. This new release
puts native Docker container orchestration in direct competition with Google’s Kubernetes, although Swarm doesn’t match all of Kubernetes capabilities yet. The addition of features such as load balanced services, scaling, overlay networking, placement affinity and anti-affinity, security and high availability make for a compelling platform.

Join our Kubernetes webinar – Moving Monoliths to Microservices.  Go

Orchestration Strategy for Docker Swarm

Having container orchestration native to the Docker platform makes the creation of
cluster much easier than with Kubernetes. Swarm has a familiar master/worker
architecture, with the capability of high availability for the master. Swarm uses a
Raft implementation for leader election
and consensus storage rather than an external provider (e.g. etcd). It also includes
built-in overlay networking. The integrated nature of Swarm makes the orchestration

As with the Cloudify Kubernetes
Cluster Blueprint
, the main value delivered by the orchestration of Swarm is:

  • creating a cluster in an infrastructure neutral way
  • auto-healing the cluster when cluster nodes fail
  • manual and/or auto-scaling the cluster when arbitrary metrics indicate the

Given these goals, the orchestration can be fairly simple because they fall into
well worn patterns that Cloudify supports directly. This initial attempt starts a
single manager and an arbitrary number of workers. The workers depend on the manager.
The workers are outfitted with Diamond metric collectors, and scale and heal policies
are defined in the blueprint. When workers are spun up, the get the security token
from the manager and join swarm. When they are scaled down, the process is reversed.
A load generating image is provided in the form of a Dockerfile to ease validating
the scaling behavior, and part of the orchestration is the generation of the image on
each worker node.

Orchestration Details

The blueprint defines two node types corresponding to the different Swarm node
roles, manager and worker. The blueprint targets Openstack, and each of the roles is
contained in a corresponding cloudify.openstack.nodes.Server type. The
worker host nodes are outfitted with standard Cloudify metric collectors. The worker
nodes are configured to depend on the manager node. The blueprint assumes that Docker
is preinstalled on the image. While Cloudify could easily automate the install of
Docker itself, the process is too time consuming to make sense in an auto-scaling use

The Manager

The manager stores its cluster token in runtime properties, which is used later by
workers to join the swarm. The script that enables this is quite simple and is called
as the result of the manager configuration in the blueprint.

    type: cloudify.nodes.SoftwareComponent
          implementation: scripts/start-manager.sh
                IP: {get_attribute: [manager_host, ip]}
      - target: manager_host
        type: cloudify.relationships.contained_in

Note that the IP of the containing host is passed through the environment to the
very minimal configuration/startup required by Docker Swarm.

sudo docker swarm init --advertise-addr $IP
ctx instance runtime-properties master_token $(sudo docker swarm join-token -q manager)
ctx instance runtime-properties worker_token $(sudo docker swarm join-token -q worker)

To initialize a Swarm manager, only the init command is needed. The
script then stores away the access tokens for future reference in the node runtime

The Worker

The worker configuration is likewise simple and depends on the existence of the
manager. A first step, to support easy demonstration of scaling, but which could also
be used as a pattern for loading arbitrary images, is to put the “stress” image on
each worker. The blueprint indicates this for the worker node as follows:

    type: cloudify.nodes.SoftwareComponent
        configure: scripts/configure-worker.sh

The configure-worker script uploads an archive from the blueprint that contains a
Dockerfile and supporting artifacts, and builds it:

# create image for generating cpu load
ctx download-resource containers/stress.tgz /tmp/stress.tgz
cd /tmp
tar xzf /tmp/stress.tgz
cd /tmp/stress
sudo docker build -t stress:latest .

The next step is to actually start the worker and join the swarm. This is
facilitated by Cloudify intrinsic functions that put required deployment info into
the environment of the start-worker.sh script:

          implementation: scripts/start-worker.sh
                IP: {get_attribute: [worker_host, ip]}
                MASTERIP: {get_attribute: [manager_host, ip]}
                TOKEN: {get_attribute: [manager, worker_token]}

start-worker.sh then can run the very simple join
command from Docker to join the cluster:

sudo docker swarm join --advertise-addr $IP --token $TOKEN $MASTERIP

Scaling and Healing

Worker hosts are installed with standard Cloudify Diamond Plugin facilitated
metric collectors for metrics related to cpu, memory, and I/O. Autoscaling
configuration in Cloudify consists of defining a group, which associates a number of
blueprint nodes with a policy. The policy dictates under what circumstance the
scaling (or other workflow) is triggered. The actual workflow (scale/heal or other)
is associated in the scaling group definition. This means that the policy itself is
just raising a flag (as it were): it doesn’t command a certain workflow to execute,
or even send parameters to a certain workflow. Since each group is statically defined
in the blueprint, a separate group must be defined for each potential action. In the
case of this Swarm blueprint, that means separate groups for scale up, scale down,
and heal. Looking at the scale up group, you can see the policy association, and the
policy configuration, which sets the threshold for scaling, the metric to use, and
other parameters. Note that the actual policy implementation, in
policies/scale.clj, is not baked into Cloudify itself, but is a general
purpose autoscaling detection policy that you can reuse in your own blueprints.

   members: [worker_host]
       type: scale_policy_type
         policy_operates_on_group: true
         scale_limit: 4
         scale_direction: '<'
         scale_threshold: 50
         service_selector: .*worker_host.*cpu.total.user
         cooldown_time: 120
           type: cloudify.policies.triggers.execute_workflow
             workflow: scale
               delta: 1
               scalable_entity_name: worker_host
               scale_compute: true

Note that scale_policy_type is defined in the imported
imports/scale.yaml file, which ultimately points at
policies/scale.clj. The triggers section defines the
workflow that will be executed (and its parameters) when the policy “raises its flag”
(actually it calls the process-policy-triggers function). Nothing new or
exotic here. The scale down group is almost identical, with slightly tweaked policy
and workflow parameters. The heal group uses the built in host
failure policy
, which then triggers the built in heal

Test Drive

In order to test drive the Swarm integration with auto healing and scaling, you’ll
need access to a Openstack cloud and bootstrap a manager
there. Recall that you’ll need an Ubuntu 14+ image with Docker 1.12 installed. Then
clone the blueprint from the git repo and edit the
inputs/openstack.yaml file. Upload the blueprint to the manager and
create a deployment using the inputs. Alternatively, you can create the deployment
from UI and enter the inputs manually. Run the install workflow on the deployment.
This will create a Swarm cluster with one manager and one worker.


From your Openstack Horizon dashboard, terminate the worker instance. Now return
to the Cloudify Manager UI and note on the deployment view that the heal workflow has


From the Cloudify UI in the deployments view, get the public IP of the Swarm

You’ll need to access to the agent key for the Swarm manager to ssh there. ssh to
the manager first from the CLI:
cfy ssh
Now ssh to the manager using the IP you got from the UI:
sudo ssh -i /root/.ssh/agent-key.pem ubuntu@<manager-ip>
Now that you’re on the Swarm manager host, you can run the pre-installed service
to generate load:
sudo docker service create --constraint 'node.role == worker'
--restart-condition none stress /start.sh

This will run the stress tool on an arbitrary
worker. In the current blueprint configuration, only the workers can auto-scale, and
only metrics from the workers are used to decide whether to scale. This means the
stress must be limited to worker nodes, which the Docker service placement constraint
mechanism nicely supplies. Return to the Cloudify deployment view and see the scale
workflow start:

Keep watching for a couple of minutes and another instance will join the cluster.
Wait around for a few more minutes, and the deployment will scale down automatically
to accommodate the decreased load.


Docker Swarm has stepped up big time to become a real competitor in the container
management space. Cloudify can add value to a Swarm deployment by supplying
portability, healing, and scaling capabilities beyond the container scaling and
healing provided by Swarm itself. Cloudify can also orchestrate Swarm services in
concert with systems external (possibly not containerized) to the Swarm
infrastructure. The source code is available on github. As
always, comments are most welcome.


    Leave a Reply

    Your email address will not be published. Required fields are marked *

    Back to top