Developing data intensive applications is a complex process, but it shouldn’t be complex for the wrong reasons. One part of the complexity comes from the fact that deploying platforms of Big Data technologies is not a simple task, particularly when undertaken manually. Automating deployments is the key solution in reducing this complexity. But the automation should also enable switching between cloud providers in a trivial way. In the DICE project, we have been succeeding in achieving both aspects thanks to flexibility of the TOSCA specification and the use of Cloudify as the core orchestration engine. Nati Shalom, CTO of Cloudify, discussed the importance of cloud portability and true hybrid cloud capabilities in more detail in a recent blog post, “Achieving Hybrid Cloud Without Compromising on the Least Common Denominator.”
Watch the Big Data for DevOps Webinar! Go
A growing number of platforms and technologies for Big Data naturally increases a number of options for solving specific problems related to storing and processing a large amount of data. These opportunities usually come at a cost of having to overcome a barrier that is deploying and maintaining the support services such as Storm, Spark or Hadoop. These services represent a platform for the applications, which solve the actual problems.
Many of the Big Data services work on any cloud platform as long as they have network connectivity, RAM, computation and storage resources available. Cloudify’s platform-specific plugins enable expressing these concepts in TOSCA blueprints, offering to users the ability to work with any of the supported platforms without having to work with that platform’s API directly.
The DICE project builds on top of these concepts, by having been developing a technology library. We have been focusing on helping developers to approach Big Data more easily, so that they can employ Continuous Integration processes into their development cycle. The library abstracts components normally deployed in Big Data clusters, while hiding away the complexities of orchestrating and configuring these services.
The library itself is packed as a set of Cloudify plugins, that are split into platform-specific node types for resources, and the node types of the Big Data services. For example, the following header would appear at the top of a blueprint for OpenStack:
This list of imports underlines the following both the ability and the need to express platform-specific aspects separately from service definitions, because such an approach is the best enabler of deploying applications on any cloud. As the list suggests, the DICE technology library also relies on Chef as the configuration manager.
A full example of a Storm application TOSCA blueprint is available here. It demonstrates that a whole working blueprint can be expressed in a single simple to understand text document. Such blueprints are also relatively easy to author. What is more, they are perfect for being generated from a deployment diagram using a tool such as DICER.
Migrating to another infrastructure is therefore a matter of changing the plugins to be imported at the top of the blueprint, and replacing the inputs declaration with the inputs relevant for the target infrastructure.
At this time, DICE supports OpenStack (Icehouse and newer, including Mitaka) and Flexiant’s Cloud Orchestrator. Examples of working blueprints that use the DICE technology library are available at GitHub.
A combination of powerful orchestration solution, Cloudify, and its plugins, along with the flexibility of OASIS TOSCA, are together the enablers of a solution that is both easy to use for the end users and portable across different cloud providers.