Scaling – it’s all the rave.
When having a conversation about cloud orchestration (all the cool kids are doing it these days), you just can’t go ten minutes without someone being the party pooper and chiming in with the good old: “Yeah, but can you auto-scale my application?.”
In fact, I am one of those party poopers myself. You can’t really blame us; ever since cloud computing emerged, scaling large application topologies has become much more feasible, because now you can provision any kind of resource via API in a matter of minutes.
However, horizontal scaling is, and will always remain, a very non-trivial problem to solve. First and foremost, the application you are trying to scale must be aware that it may be scaled, and not rely on any state that may change upon scale.
OpenStack Orchestration made easy with Cloudify. Get it now. Go
Take a Web Server instance, for example, in order to be able to scale it cannot store session details that are relevant for subsequent requests, as those requests may be handled by other instances. Only once this requirement is met, can we start talking about the other, more “generic” challenges:
- How and what kind of metrics to measure.
- How to trigger the scaling process.
- How to build the process itself.
In this post, I will be talking about these aspects and we will see how one can approach this problem in an OpenStack cloud environment.
Scaling On OpenStack via Heat
OpenStack Heat is an application orchestration engine designed for the OpenStack Cloud. It is integrated in the OpenStack distro and can be used via CLI or via the Horizon GUI. Heat uses a proprietary templating language called HOT (Heat Orchestration Template) for defining application topologies.
This section assumes basic knowledge of Heat. If you don’t happen to have this knowledge, you can have a look at the following links to help you understand what it’s all about:
Autoscaling in Heat is done with the help of three main types:
An AutoScalingGroup is a resource type that is used to encapsulate the resource that we wish to scale, and some properties related to the scale process.
A ScalingPolicy is a resource type that is used to define the effect a scale process will have on the scaled resource.
An Alarm is a resource type the is used to define under which conditions the ScalingPolicy should be triggered.
This will all become much clearer once we look at an example, so lets go ahead and do that.
Our use case will be auto-scaling a WordPress server that connects to a static and shared MariaDB instance.
The following are excerpts from the full auto-scaling example.
This is the definition of the MariaDB server:
We can see that it is simply a resource of type OS::Nova::Server with a user_data property for installing the MariaDB via yum.
Now let’s see how the auto-scaling part comes into play.
Any auto-scaling process implementation should always answer three basic questions:
- Which resource to scale?
- What does the scaling process do?
- When should the scaling process be triggered?
Q1: The Which
This is a resource of type OS::Heat::AutoScalingGroup, and it defines that we want to scale a resource of type OS::Nova::Server that installs httpd and deploys the WordPress application onto it. Note that the scaled resource could be defined outside the scaling group and then referenced using the get_resource intrinsic function.
Q2: The What
This is a resource of type OS::Heat::ScalingPolicy. Let’s have a closer look at its properties
- This is how we link this policy to a specific scaling group, which in turn, defines the resource to scale.
- This specifies that we are going to create a change relative to the current capacity. Other options here can be “exact_capacity” or “percent_change_in_capacity”.
- This is the whole point really, each time this policy is triggered, we want to add one more WordPress instance.
Q3: The When
We see that this alarm is a resource of type OS::Ceilometer::Alarm that defines properties which basically tell the Heat engine to:
trigger the web_server_scaleup_policy once the average CPU on the WordPress server is bigger than 50% for at least 1 minute
Alright, it took us a while but we made it through defining a HOT template with auto-scaling capabilities. Now let’s address the scaling aspects we mentioned in the beginning of the post with regards to the example we just saw:
We see that metric measurements are done via Ceilometer, which is a built in monitoring and metering system integrated into OpenStack. It provides various metrics on all kinds of OpenStack resources. In the current example, we were using the cpu_util metric to check CPU utilization of the WordPress server. There are a lot of different metrics to choose from, ranging from Compute instances to LBaaS.
However, there is something missing. In many cases, what we are really interested in, are application/middleware specific metrics. That is, I want my WordPress server to scale when there are too many requests hitting the current endpoint. This type of information is in no way exposed via Ceilometer, which of course makes sense, since it does not have any knowledge as to what exactly is deployed on the servers it is keeping track of. The semi good news is that, technically speaking, you can push custom metrics to Ceilometer via the User Defined Data API. In practice, this is a non-trivial engineering effort that needs to be done by the user. Ideally, it would be nice if you could configure which custom metrics will be pushed to Ceilometer via the template, and have a built-in component on the server that actually does the work.
The scaling process is triggered automatically as soon as the alarm threshold is breached. Heat also provides a webhook for explicitly triggering scaling policies using the alarm_url attribute attached to the policy itself.
So far, we haven’t really discussed what the scaling process actually does, i.e, does it just create a new instance of the resource and that’s it? What does it look like? Where is it defined?
This isn’t by accident, it is simply because all of this information is hidden from users, and therefore, by definition, the same process will apply to any kind of resource. For example if I wanted to scale the MariaDB instance instead, I would simply reference that resource inside the AutoScalingGroup instead of the resource inlined.
However, what if scaling a DB instance has different administrative implications on my system than scaling a Web Server instance? Sometimes you might want the ability to execute certain operations prior to launching the new instance. Perhaps there are some SLA issues that need to be enforced that take place using a third-party endpoints. Actually, this aspect is not specifically tied to automatic scaling; the same arguments can be applied to the creation of the stack, deletion, updating…and, well, you see my point.
Ok, I think this was a mouthful and provided a lot to work with regarding auto-scaling in an OpenStack environment, but this is only the half of it. In my next post, I’d like to compare this process with a TOSCA-based process, that is relevant to any other cloud, or even hybrid cloud environments with OpenStack. More to come.