Network outages are primarily due to human error. More specifically, these errors are the byproduct of improper changes or unforeseen consequences from changes made to the configurations that control how network devices connect and exchange frames, packets, and other data with other network devices. Network devices such as hardware network devices, switches, load balancers, firewall appliances, etc. can produce the outages or other error conditions based on a misconfiguration. Outages occur despite best efforts to validate the configurations and configuration changes before they are deployed to the network devices.
Current validation tools and methodologies are insufficient because they do not provide a comprehensive validation of a configuration change and the impact of the change across each and every network device of the network. Laboratory testing and canary testing are two examples of widely used validation methodologies that suffer these shortcomings.
Laboratory testing provides a safe environment with which to test configuration changes apart from the actual network. Test are conducted against a small sampling of networking hardware that is representative of the physical network devices deployed in the network. However, the network hardware used for the laboratory testing is not connected to the actual network. Accordingly, any laboratory testing validation is incomplete because it is conducted against a fractional reproduction of the actual network. This fractional reproduction cannot account for the actual topology, connectivity, or interoperation between the network devices in the actual network. The fractional reproduction also cannot identify the true end state of the network because of the missing connectivity and hardware. In other words, the full propagation and impact of a configuration change across the entire network cannot be identified from the partial validation provided by the laboratory testing.
Unlike laboratory testing, canary testing can be done against the network devices of the actual network so as to account for the network or device state and the impact of a configuration change to these states. Canary testing involves testing the configuration change against a small subset of the actual network. If no errors are observed in that small subset, the configuration change is applied and validated against a larger subset of the network. In any canary testing stage, the validation is of limited scope, because some errors and outages resulting from a configuration change may be outside the subset of network devices under test or observation. Canary testing can therefore provide a false validation. Canary testing therefore cannot be used to holistically or comprehensively validate the network end state as canary testing necessarily requires segmenting the network for partial or sampled validation.
Accordingly, there is a need to holistically validate network configuration changes without impacting the current steady state of the network. The holistic validation should identify a modified end state of a network resulting from one or more changes to configurations of the hardware or physical network devices without modifying the running configurations on the network devices.
The only true means by which to achieve holistic validation of the network end state today is to apply the configuration changes directly to the actual network and to detect and correct the errors as they happen. Implementing changes without knowing the full scope of risk for outages, blackholes, lost traffic, etc. in the network is, however, unacceptable as such errors result in lost productivity, lost revenue, and interruption to content and services.