An Oversight in Network Hardware Vendor-specific Automation Libraries and Frameworks: Myopicity


Introduction

Libraries that provide a simplified (from a programming POV) interface into the control of a device are valuable, there is no denying that; however, there are several weaknesses of current and future libraries that will need to be solved before automation can truly start achieving its full potential. I'll focus on what I call "myopicity" in this write-up.

Myopicity - Single-Device Focused

Below is an example of Junos PyEZ (Juniper's Python Library for interacting with Junos Devices):

from jnpr.junos import Device
dev = Device('router1', user='lamoni', password='automatem3').open()
hardwareInventory = dev.rpc.get_chassis_inventory()

Even if you don't know Python, it's easy to tell that this code is connecting to the device and grabbing the "show chassis hardware" outputs. It's very simple and elegant, as we'd hope; however, the true issue lies in the fact that very rarely do we need to grab data from just a single device. In my experience, a majority of requests/assignments for data collection (or configuration pushing) come with a list of devices we need to operate on. This poses a problem due to the blocking nature of Python (PHP and Perl suffer as well... in fact, most languages work in a blocking manner).

What is blocking I/O behavior?

"Blocking" is a form of I/O processing that stops the progress of a program/script while the script is waiting to finish sending data to somewhere (whether it be over a network, or writing to a hard-disk), or waiting to receive data (whether it be waiting for a response from a web server, or reading a large file from the hard-disk).

This is relevant to us due to the fact the script above is going to connect to a single device, send a request for "show chassis hardware", and then pause until it gets that data back. This issue becomes very obvious when you run the above code in a for loop and feed it an array of hostnames to connect to. Because it's blocking behavior, it would connect to Device1, send the request for "show chassis hardware", and then wait until it got the reply before moving on to the next device in the list. This results in painstakingly slow data collection and temporal misalignment of the data we're collecting (although to be fair, in our example, temporal alignment of chassis inventory isn't really of the utmost importance... if we were collecting interface statistics, that'd be a different story).

Possible Solutions

Non-blocking I/O

"Non-blocking" behavior is when a program/script moves forward in its progression through its code instead of pausing execution while it waits for data/output to be returned to it. If we were to implement the above code in a non-blocking language (or in those languages uses non-blocking constructs), we would see the script connect to a device, send its request for "show chassis hardware", and then because the "ball is in the device's court" (i.e. the device needs to get the chassis inventory to send back to us), it moves on to the next device in the list and sends its request. Once the first device responds, the script then reverts back to handling the response.

You may wonder why languages work in a blocking fashion when non-blocking behavior seems obviously desirable, but non-blocking behavior can throw programmers for a loop (although, part of that can be blamed on the fact that these languages work in a blocking behavior... if all languages were non-blocking, the mindset of the programmers would be used to it). "Odd" behavior can be observed when coding in a non-blocking language for the first time. I use quotes around Odd because it's expected behavior, but can seem odd to an uninitiated programmer. It's possible (and likely) that they'll see code on line 5 executing before code on line 4 has finished "executing" (assuming it's waiting for a response from something, whether it be a request over the network or reading a file). Due to this behavior, non-blocking languages can get entangled in what's called "callback hell". Callbacks are functions assigned to requests that handle the data once it has finally been returned from whatever (server, hard-disk, etc...) it was requested from.

Multi-threading

By taking advantage of multi-core CPUs, we can enable parallel data acquisition and pushing to network devices. This comes at the cost of complexity in our code; however, I believe that the developers making these libraries can abstract a majority of the pain away from the consumer programmers. This would centralize the "hard" part away from our application logic.

A Combination of Non-blocking and Multi-threading

Multi-threading alone in a blocking language doesn't solve the "wait for response" behavior we see in blocking languages. A combination of both non-blocking and multi-threading would allow us the fastest data acquisition/pushing possible. This, of course, costs in complexity, but again, there's no reason library developers can't abstract a majority of this away from consumer programmers.

Message Queues

I was hesitant to mention this possible solution because it'd be external to a library, for the most part, but I'll mention it anyway. By utilizing Message Queues (e.g. Beanstalk), we can easily abstract the need for a user to worry about multi-threading or non-blocking behavior. Most message queues can utilize "workers" to achieve parallelism. From both the library developer and consumer programmer's point of view, this is the easiest solution; however, the cost is in the form of requiring a completely separate piece of architecture/software. Either the library developer (or the consumer programmer... but preferrably the library developer) would still need to build the methods used for communicating with the queue server, and handling responses from the server.


Conclusion

I believe it is the responsibility of library developers to solidify an approach to handling multi-device data acquisition, whether it's through a non-blocking implementation, multi-threaded implementation, or a combination of both. As network management systems move towards being web-focused, it will be of the utmost importance to acquire data (or push data) to multiple devices as fast as possible, so as to not break one of the golden rules of web development: "if a user performs an action and doesn't obtain results within a few seconds, they become frustrated".

Lamoni