Networks are growing more and more complex as the backbones of modern information technology systems. A typical large company may employ hundreds or thousands of devices and software components from different vendors to form its network infrastructure. Growth in complexity and size also brings more points of failure, such as forwarding loops, configuration mistakes, reachability issues, or hardware failures.
Diagnosing network failures is difficult for several reasons. First, the forwarding states associated with each network device are distributed throughout the network and defined by their corresponding forwarding tables and other configuration parameters. Second, the distributed forwarding states are difficult to monitor—often requiring the network administrator to manually login to the device and conduct a low-level test. Third, multiple administrators or users can edit the forwarding states at the same time, resulting in inconsistent forwarding states.
Conventional network diagnosis methods and tools are labor intensive, time consuming and often protocol dependent. For example, network administrators may use rudimentary tools (e.g., ping, traceroute, and Simple Network Management Protocol) to track down network failures. Such methods only cover a tiny fraction of the network state space. The diagnosis process becomes even more difficult with the current trend of increasing network size and complexity.
Additionally, conventional network diagnosis methods and tools are ad hoc in nature as they only solve the manifested issues in the network. For example, simple questions such as “Can host A talk to host B?” or “Can customer X listen to the communication?” are difficult to answer. Thus, conventional tools cannot foresee or prevent problems before they arise.
On the other hand, large software companies can push out new software products quickly because they have a huge quantity of tests that comprehensively test the behavior of their software products prior to deployment. However, current network management and testing lacks sufficient testing capability to provide confidence that the network will work when deployed, especially when there are changes to the network several times a day.
Thus, there is a need to develop methods and tools to manage and verify networks in a fast, large-scale, automated, and systematic way.