Pluggable Monitoring and Closed Loop Orchestration with Cloudify and Nagios
- August 30, 2018
- Posted by: Ido Berkovitch & Jeremy Hess
- Category: DevOps, Monitoring, Nagios
A few weeks ago, we introduced Cloudify 4.4. As part of our roadmap to making Cloudify more modular and offering users the ability to plug-in their own tools, we now allow you to bring your own policy and monitoring to Cloudify.
Test-drive Cloudify’s Multi-Cloud Lab Free! Get Lab
This thinking is exactly in line with our core tenets of being open and inclusive. We recognize that organizations already have their own tools for various functions in their stacks and we want to make sure that Cloudify makes it as easy as possible to integrate your own tools. This trend started a few releases ago with pluggable authentication via LDAP, and that has now expanded to SAML and Kerberos as well.
Riemann and Diamond/psutil
Cloudify Manager now ships without the Riemann and Diamond (psutil for Windows) tools by default, but you can add them with flags if you prefer to use that existing option. This previous system had some issues that we felt made it imperative to move to a more robust tool for all-in-one monitoring and metrics.
Current overview of monitoring
Some of the aspects that made this setup less practical were:
- Maintenance and troubleshooting were quite difficult and unpleasant at times
- Metrics required an agent with Diamond or psutil, and agentless metrics required a lot of work
- Security was a concern as this setup executed user-supplied code on the manager
- Upgrade issues could cause the manager to stop getting metrics when a node was reinstalled
- Scaling to a large number instances with metrics could slow down the manager significantly
Pluggable Monitoring and Closed Loop Orchestration
As noted above, this approach to monitoring and policies is much more modular and brings better closed loop orchestration to Cloudify. This also opens the door for users to integrate any monitoring toolset they prefer or are already using.
Here are some of the reasons for pluggable monitoring:
- External monitoring enables scaling which is separate from the manager which means if monitoring is overloaded, the manager will not be affected
- Upgrades are done out of sync with the manager as it is now plugin-based, so not only will manager upgrades be easier, but monitoring bugs can be fixed much more quickly and new versions shipped without waiting for a new manager version
- Modularity and script-based monitoring means much less tailoring required for users with varied setups
- Security is also less of a concern as the user can now supply any security policies and follow practices as required by the organization
While there are many monitoring tools out there, we have already started work on integrating one of the foremost DevOps monitoring tools, Nagios, which will be integrated into Cloudify as a plugin in the near future.
New overview of monitoring
Here are some of the features that make Nagios a great fit for Cloudify:
- SNMP polling
- SNMP traps for monitored nodes or external systems
- Aggregate checks (average across all instances of one node) enables scaling based on connections per second for an entire whole web cluster rather than each individual server
- Workflows are all defined using a custom dict used with built-in workflows or anything else you can define as a workflow in Cloudify
- Almost stateless – if you are defining checks that calculate rates it keeps the last value it had to calculate a rate, otherwise it is completely stateless so if you need to heal it, you can
If you thought Nagios was where we were stopping, you thought wrong. We are still working on adding some monitoring features including:
- Historical data storage
- Cloudify Manager widget for the UI
But wait – there’s more. We are also working on adding Prometheus and InfluxDB plugins, so stay tuned for future posts and feel free to let us know what you want to see next.