- September 27, 2018
- Posted by: Ilan Adler
- Category: Network Automation, Network Orchestration
The Network as Code
These days, teams like DevOps, NetDevOps, Infrastructure as Code, and Networking as Code are being used in marketing materials everywhere. But what a lot of people don’t realize is that these IT concepts are already a big part of network operations. They will likely become part of the future for any large-scale network currently in operation.
So, what do these buzzwords really mean and what does the future hold for network operations? With network services like routing, security, and load balancing all being offered in software, network engineers are being asked to learn to code. Where will they be in another five years?
To be honest, the way we manage network devices has not changed very much in the last 20 years. The only major change has been the evolution from the Telnet protocol to Secure Shell (SSH). But the main method is using Common Language Infrastructure (CLI) to connect to each device one by one and to save configurations in various formats (such asSW1-V1-ospf-change-to-fix-2.txt). However, this method of version control has no standard and lacks accountability. Fragmented configurations, with the production network and engineers making multiple changes as the only source of truth, can only lead to problems.
For inspiration, one should look to system engineers and software developers. They have been doing the right thing for years by treating all of their servers as one system, and by automating processes and upgrades with version control. They have been using version control in their development code with the option to go back in time to any change. In current network operations, in contrast, each box is managed as a single device with no logical way to revert changes; most configurations are being performed manually by engineers late at night, where mistakes can easily occur.
Adopting DevOps Processes
The first step in implementing the “Network as Code” approach is to standardize configurations and get them into a templated format. Rolling out new sites or buildings then becomes a simple change of variables instead of rewriting a full configuration for each new device, which wastes time and resources on something that is already nearly complete. Network engineers pride themselves on having all the knowledge in their heads, being able to remember IP addresses for a range of remote sites, and being able to handwrite a configuration from scratch. However, this cowboy-style approach is a thing of the past.
The network engineer was once considered irreplaceable and a position of power, but this is no longer the case. Actually having all the network configurations in one place in a templated format should be seen as being fully in control of your environment. When the time comes to change jobs, knowledge of the network won’t leave as well. Instead, it will remain in the network code.
The next step is to implement a version control system. Every change on the network should be managed via a central version control system (most popular nowadays is definitely Git), with a single production build. Changes to the network can be requested and approved, and when all parties are happy and have signed off on the changes, they are then pushed to the production build. This process can shift operation teams away from weekly meetings in which changes are reviewed by managers who have little appreciation of the impact, and only want to get a general awareness of what is going on in case any issues arise.
When implementing “Network as Code,” all changes can be viewed by all relevant parties with a distinct chain of command and, once fully approved, pushed to the production build. Then, because the entire network is managed as code, pushing this change during the day is not seen as a big risk. That’s because there is no risk of the engineer making a mistake and configuring the wrong device or deleting a line of configuration—all changes will be automated.
Having a central version controlled location for all network configurations will also eliminate ad hoc changes made by enthusiastic engineers late at night to fix a problem. The only way to make a change now is via the continuous integration pipeline. All network changes need to be approved via the version control system and can only be pushed to the devices via the automation tool. Furthermore, unauthorized changes to the network will be removed by the next run of the continuous deployment tool. (More on this later.)
Organizations need to get away from the tendency to believe that the “network guy” knows everything. Treat the network as code, and manage updates to this code as a team. Use automation tools to do the hard work, and let the network engineers provide the intelligence.
Networks need to scale, and they must scale at a pace, which frequently means in the cloud. Operations teams are being pushed harder and harder with only two ways to scale up: Hiring more staff, or automating more. Hiring more staff just adds to the problem, because you are just increasing the risk of errors. Expanding automation, meanwhile, will ensure that the core network team is fully in control, and can easily manage the network without fear of making mistakes.
Network engineers should not fear the Network as Code concept, they should embrace it. That’s because getting everything in a templated version in a controlled state will free them from mundane daily tasks and allow them to focus more on design and performance.
Bridging the Gap to IaaS
IaaS (Infrastructure as a Service) means paying someone else to run your infrastructure. Using service providers like Amazon, Microsoft or Google to run your infrastructure means you need to be 100 percent on top of your networking requirements—putting it in the cloud does not mean someone else will manage it.
If you are already treating your network as code, moving to IaaS could be a straightforward process. However, if you have to determine your network topologies, validate configurations, and fix inconsistencies in the network, this is going to lengthen your migration window.
Modelling Network Services
Another brilliant benefit of the Network as Code approach is being able to model network services. With so many network devices now available as virtual options, the opportunities to fully simulate your live network as a virtual testing environment is becoming easier by the day.
Before any change is pushed to the production network, the change could be pushed to the simulated test network and the change could be validated. This would give any change managers added assurance that the network change is not going to cause any issues. Some current systems will push the change to production once the change has passed the checks on the test environment. Others still require one more manual approval before being pushed to production. How much of your network you can replicate within a virtual environment depends on what your network consists of. However, with more devices being virtualized all the time, this ultimate testing nirvana is becoming more of a reality each day.
Does intent-based networking require rip-and-replace? Learn more
Continuous Integration / Continuous Delivery
The CI/CD solution is software that monitors certain code repositories. Once changes have been detected within certain repositories, the software then performs other tasks like pushing changes. Jenkins, BuildBot, Gitlab or Travis CI are just a few of the tools available today to perform this function.
For example, consider a network engineer who has been asked to create a new VLAN across a group of switches. The engineer would go into the code repository for that site and request a change to the variables. (Remember, the full configuration is being handled by a template which is referencing variables for each site.) The change then generates an approval by the network owner/admin. The change would be verified, and, once approved, then uploaded to the production code repository. The CI/CD solution would recognize a change to the config site A and generate a new code, build, and then push that code build to the test environment for site A. That would be most likely be done using a tool like Ansible or Puppet.
This would then generate some network tests to verify if the change had affected any connectivity. If all tests pass, then a final approval would be required by the network administrator to send this change into production. So far, no network engineers have had to manually configure the CLI, all but eliminating the risk of typing errors. The configuration change request was based on a template that has already been proved to work with previous changes.
Now for the final step in the process: If the network administrator observes any issues, changes can be reverted by rolling the production code back one step in the version control system, and redeploying. Each configuration change is a push of the production-verified code and not random changes made by various engineers with no accountability.
Automated Policy / Single Point of Truth
At this point, organizations should have completed a verified production code build that is version controlled, managed by the network automation system and not various text files, and spread across different engineers’ laptops. It is now possible to begin enforcing that code.
With a large network, many changes get made to various devices. This means that over the course of months or years, consistency among devices can change. Then, when a global change is required across all devices, such large deviations are exposed. This would normally be uncovered during an upgrade in which all code is reviewed.
With a Continuous Integration/Continuous Delivery approach, policy can be enforced when the production configuration is actually applied to all devices at regular intervals—either hourly, daily or weekly. This ensures that all devices match the production-approved configuration, and any ad-hoc changes that might have been made to the network without approval are overwritten.
Organizations also need to adopt the practice of “a single source of truth,” or an SSOT. This means having a central repository that holds all the network device information like IP addresses, subnets, device location, etc. Typically, this is a spreadsheet that has grown out of control over the years and is not very useful for integrating into a network automation solution. Once a network has scaled beyond the spreadsheet, an SSOT solution becomes necessary. All device information can be referenced or pulled from the software either via an API or through direct integration. This practice is another piece of the Network as Code jigsaw and can be utilized in many ways.
For example, when deploying changes to a specific site, the configuration template only needs to reference Access-Switches-Site-A. The SSOT repository is contacted and asked to send back all devices that match these criteria. All access switches in site A are then added to the production code run.
One other point to consider is dynamic devices. A typical network used to consist of metal boxes with names and IP addresses that never changed. With more and more network devices being software defined and deployed on demand, how does one keep track of names and IP addresses? A typical firewall device deployed in the cloud could be called LB89-resource-site-1-vpc2 and might be active only for a few weeks. This is where a dynamic inventory is useful. As devices are spun up in the cloud, they are tagged with their function and site. The SSOT repository picks up this information. Any further configuration changes are pushed to the devices based on their site and tags rather than the specific device itself.
While it would be possible to perform all these tasks manually, at some point it just becomes unmanageable. Most network engineers do not want to spend their working day manually updating inventory files of dynamically spun up devices. Instead, treat the network as code.
While this all might seem overly idealistic, the reality is that it is happening today and network management is transitioning to a DevOps mentality. Network configurations are basically code anyway. Treating them like code with all the processes developers have been using for years is a very logical and sensible approach.
The Biggest Challenge of Network as Code
However, the ultimate network management solution described above is only possible if all devices are talking the same language. If a network consists solely of Cisco Nexus switches or just Juniper, then automating configuration changes is quite simple. There is a standard configuration and a standard way of connecting to the device and making the change. If, however, a network consists of multiple vendors, multiple devices, both on-premises and in the cloud (as do most networks), seamlessly automating the network and implementing the Network as Code approach becomes more of a challenge. Other software packages such as Napalm are trying to address this problem.
If one makes a single defined network operation change without considering how to implement that change to multiple vendors’ hardware, the software handles the differences in configuration while the engineer focuses on what they actually want the change to do. This will be the challenge for network engineers of tomorrow: creating a seamless integration between a standard change request and translating that to multiple network vendors. That’s why now is the time to adopt these processes and to start treating your network as code.