Cloudify is a complex piece of software which handles most of the application’s orchestration tasks. Because of it’s versatility, there are many different use cases and points of failure that can be discovered during user adoption.
Recently, we started the process of building a service solution in order to deal with many of our struggles and make the development process easier and more robust. In this post, I will detail why any organization should consider building a similar internal service, and how we approached building ours.
Why Build an Internal Service
Shorter Development Cycles
Our main reason for creating this service was to improve our own development process. One way we approached this was to make the Cloudify team the first users of the service (our sales engineers, support, and devs). This way new bugs, edge cases, and irrational behaviors are discovered at a much earlier stage and return directly to the production process before the release is closed.
You Are Your First Customer
Each software has it’s own usability issues. When our own developers and internal teams start to use Cloudify as actual users, many UI/UX issues are already flagged and take into consideration way before it reaches paying customers.
Monitor and Test our Software in Production
When your software is running in a “real-life” production environment, your tests are far more accurate and able to simulate more customer cases. Additionally, you can now perform real monitoring in order to gain insights on what is happening to your software, which will help you answer customer concerns and fit your product to their cases.
Log Analysis for Common Problems and Better Debugging
Stable software which runs well and can be used consistently for long periods of time, allows you to deep dive into software analysis and gain a better understanding of your system’s behavior.
Check out our free, on-demand webinars on all things cloud. Go
How We Built Our Internal Service
Users of our internal “Cloudify Service” are based on two main groups:
- R&D – Cloudify developers and QA team. We use Cloudify Service as a test-case for production environments. In that format, we have shortened the development cycle (each feature is used immediately in a “real time” environment) and improved it.
- Demos and POCs – Our second goal was to create a tool which would allow us to show the desired (most of the time the latest) version of Cloudify – with the brand new features and capabilities. This led to the creation of a single, stable environment for all of our POCs and Demos (instead of creating one for each case and each relevant team member) and enables users to share examples, use cases, and best practices.
While working on making this a stable service, we stumbled upon several problems, the worst of them being that to be able to access a manager, you need agents to have a direct line of communication with it. Usually for our customers this is a no brainer, as they run Cloudify in a closed environment where network access is restricted to their needs. Since we’re running Cloudify Service publically, we couldn’t afford that and had to come up with a solution to the problem.
We decided to trust what many other services are doing and use the internet as a gateway between the user’s application environment and the service.
We created an agent called “controller,” which is our way of decoupling the environment from the service. It performs all IaaS API calls for the relevant environment, and all ssh (and later, winrm) connections are performed using the fabric plugin.
Cloudify Service Explained
After a lot of thought and effort, below is the solution we came up with, as well as an explanation of the various components.
- Service components – As deployed today in AWS.
- Monitoring & logging
When dealing with complex software systems, one of the first things you want to do in order to understand things better is monitor your system and make it easier to handle the logs. Exposing Cloudify as a usable service allows us to start this process correctly, monitor our product in a “real” environment, analyze it, and better understand its performance and points of failure.
- Automating the process
Of course, when talking about a service, we need to keep in mind that when the usage increases, the system can fail and things can break. Hence, a quick recovery process is needed with as few manual tasks as possible. We created an agent to handle this process. Basically it’s a machine which “lives” in the same environment as our service and contains all the scripts and tasks to recoverhealupdatescale it. We call it the “supervisor.” Funnily enough, most of the process for creating Cloudify Service is actually done by Cloudify.
The purpose of this post was to describe an internal process which the Cloudify R&D Team created. The goal is to share an approach for solving some of the problems we came across and encourage those facing similar problems to think about trying do the same. As with all things in life, you will have your fair share of struggles on the way – we sure did (mainly security issues and conflicts with our original product code). But once you overcome them, things get much easier.