Docker…Containers, Microservices and Orchestrating the Whole Symphony
Originally posted at opensource.com
The microservices architecture is far from a new trend, it’s generally accepted as a better way to build apps these days. The common way to build apps was, until a few years ago, the monolithic approach – which was, if you look at it from a functional perspective, basically one deployment unit that does everything. Monolithic apps are good for small scale teams and projects, but when you need something that has a larger scale and involves many teams, it starts to become problematic. It’s much harder to make changes, as the code base becomes bigger and more people make changes to it.
This is basically the exact opposite of all things continuous delivery, because it’s more tightly coupled and requires an exponentially growing amount of coordination to make continuous updates. Therefore updates become more painful and less frequent, contributing further to fragility of the application.
Cloudify 3.1 – everything pure-play orchestration based on TOSCA. Try it. Go
So what are microservices really and how does this architecture improve delivery cycles?
Microservices were developed as a way to divide and conquer.
Basically, the microservices approach in a nutshell dictates that instead of having one giant code base that all developers touch, that often times becomes perilous to manage, that there are numerous smaller code bases managed by small and agile teams. The only dependency these code bases have on one another is their APIs. This means that as long as you maintain backwards and forward compatibility (which albeit is not that trivial), each team can work in release cycles that are decoupled from other teams. There are some scenarios where these release cycles are coupled, where one service depends on another or depends on a new feature in another service, but this is not the usual case.
Leading the pack on this at the time were companies like Netflix or Amazon for example. They decided that instead of building one monolithic application to handle all aspects of their service, they would build small services that handle discrete business functions. The boundaries between these units are functional APIs that expose the core capabilities of each service. For Amazon.com, this would be the different aspects of their website – recommendations, shopping cart, invoicing, inventory management, and so forth. Instead of all of them being part of one ginormous deployment unit, Amazon implemented each function as a self-contained service with a well-defined interface. The advantage of this is that you can have disparate teams, each responsible for all aspects of their service from A to Z. So if there’s a team responsible for billing, they are responsible for everything from writing the code, testing it, pushing to production, handling failures and anything else that may happen with that service.
Needless to say, this is better for continuous delivery as small units are easier to manage, test and deploy.
Ok, so what’s the open source connection?
Just one example is with Netflix. As leaders of this architectural trend, Netflix also built their many tools that facilitate their distributed and complex architecture as OSS projects that anyone can fork and customize, and this in a way influenced a long line of technologies that came in their wake as well. One such technology is Kubernetes that was specifically designed for microservices by extending Docker’s capabilities. This enables companies to choose the right tooling for the job, and quickly adopt them without complex licensing cycles, as well as adapt them and extend them for their specific architectural and business needs.
Sounds too good to be true?
Of course, with the good, there is always the bad. This creates a whole ‘nother set of problems like understanding system as a whole, what’s dependent on what, and when one service fails, there is much higher possibility that it will cause a cascading failure which is far harder to trace. For more information on the tradeoffs that come with a microservices architecture, read this excellent piece from ThoughtWorks.
“Using services like this does have downsides. Remote calls are more expensive than in-process calls, and thus remote APIs need to be coarser-grained, which is often more awkward to use. If you need to change the allocation of responsibilities between components, such movements of behavior are harder to do when you’re crossing process boundaries.”
Why Docker is Such a Good Fit for Microservices?
If we dive into technology, Docker is excellent for microservices, as it isolates containers to one process or service. This intentional containerization of single services or processes makes it very simple to manage and update these services. Therefore, it’s not surprising that the next wave on top of Docker has led to the emergence of frameworks for the sole purpose of managing more complex scenarios, such as: how to manage single services in a cluster, or multiple instances in a service across hosts, or how to coordinate between multiple services on a deployment and management level.
To this end, we’ve seen open source projects like Kubernetes, Maestro-ng, Mesos and Fleet spring up to answer this growing need. If we take a look at Kubernetes for instance, which is really gaining traction right now, it is backed by Google and is now seeing leading players like Microsoft and RedHat join in; this project was built for microservices, by providing a few key functionalities.
With Kubernetes you can easily deploy and manage multiple Docker containers of the same type through an intelligent tagging system. You basically describe the characteristics of the image you’d like to deploy – e.g. number of instances, CPU, RAM – and Kubernetes will allocate it for you based on what is physically available on the set of hosts you’re deploying too. You don’t need to care where these are located physically, since the tagging system enables you to uniquely identify your images. On top of the tagging, you can create another level of grouping called “pods” that are essentially a continuous query for containers that have one or more tags. Kubernetes constantly updates the set of containers that satisfy the query and implements load balancing across these containers so that there’s a single endpoint for anyone accessing them from the outside.
(Courtesy of Codemotion on Slideshare: http://slidesha.re/12lEhb9)
This format works very well with microservices and Docker, as it caters for the exact traits that are so important for the microservices architecture – easily deploy new services (that’s where Docker packaging comes into the picture), scale each microservice independently, make the failure of a single mircoservice instance transparent to clients that access it, and enable simple, ad-hoc name-based discovery of service endpoints.
So What’s Missing?
This is all very useful when it comes to simple, stateless services that you can load balance across, and where all instances are completely identical. A good example for such a service would be web service of some sort or a stateless web application.
Things get a bit more complicated when you have stateful services, or when the microservice itself is composed of multiple pieces – such as a database or a messaging queue. Kubernetes has an implicit assumption that all the state that is shared between service instances (e.g. a Mongo cluster that stores user profiles) is managed outside of Kubernetes.
However, in many cases, what you want to manage is composed of multiple tiers (web, database, messaging, etc) that are dependent on one another. And as such, often times, this will be tough if not impossible to do with Kubernetes. Kubernetes is designed for a situation where each container is literally self-contained, and replicable; many times this isn’t necessarily the case. This is also true if you want to automate the deployment and management of application tiers that are not microservices – like a central data repository, e.g. a Hadoop or Cassandra. The latter two cannot be deployed on top of Kubernetes (although something simpler such a Redis can be).
Courtesy of: http://martinfowler.com/articles/microservices.html
In such a case you’d need an orchestrator capable of describing more complex topologies and deployments – and this is where TOSCA comes into the mix.
The idea with TOSCA is to deploy each piece in the most ideal approach for it. If it’s a simple stateless microservice, then the best method is to use Docker and Kubernetes (or the likes of it). When you’re dealing with more complex topologies that require intricate orchestration (e.g. a fully replicated and sharded mongo cluster, or a more complex microservice), this would be the scenario to use a TOSCA blueprint. Naturally, a TOSCA blueprint can also be used for the former case (i.e. spawn a few instances of a Docker image) if you want to stick with a single way of doing things.
An example of this kind of implementation is with the Cloudify Docker orchestration plugin that leverages TOSCA for simple microservices deployment (using TOSCA output parameters as means to expose a specific service’s endpoint) and more complex topologies through support for the orchestration of complex application stacks. Based on this approach, you also get the side benefit of support for post-deployment concerns, e.g. auto scaling, monitoring and auto-healing, that are made possible through Cloudify’s TOSCA-based blueprints.