Network management and operation arguably continues to thwart modernization attempts by the networking community. There are a number of reasons for this problem. First, network management is inherently difficult because of the scale, distributed nature and increasing complexity of modern communication networks. Second, network management tools and practices have not kept up with the ever-evolving and complex nature of the networks being managed. Third, and perhaps most importantly, current network management approaches fail to capture and utilize, in a systematic fashion, the significant domain expertise (from vendors, service providers and protocol designers), which enables the continued operation of the network.
In a typical large Internet service provider setting, hundreds or thousands of network devices are distributed across vast geographic distances, and their configurations collectively determine the functionality provided by the network. The protocols and mechanisms that realize such network functionality often have complex dependencies that have to be satisfied for correct operations. Such dependencies are often not precisely defined, or at least not expressed in a systematic manner. When they are violated through misconfigurations, software bugs, or equipment failures, network troubleshooting becomes an extremely difficult task.
Despite these evolving complexities, network management operations still largely rely on fairly rudimentary technologies. With few exceptions, network configuration management is still performed via archaic, low-level command line interfaces (CLIs). Vendors describe protocol dependencies and network-wide capabilities in device manuals or other technical documents. Network engineers interpret these vendor documents and in turn produce service provider documentation, which describes in prose how such services might be realized. Similarly, planned-maintenance activities rely on the experience of human operators and their ability to interpret and follow procedures documented by domain experts to prevent undesired side effects. In short, current network management methodology depends on the knowledge base of domain experts being captured in documents meant for human consumption and then derives, from this captured knowledge, systems and procedures to ensure that the correct document be consulted and followed to perform network operations.
In cases where network operations have progressed beyond the capacity of human interpretation and manual execution of procedures, tools are used to attempt to automate the procedures that a human operator would have performed, and/or reverse engineer the protocol and network dependencies that prevail in an existing network. For example, sophisticated network configuration management tools capture the actions of human experts for subsequent automation. Existing fault and performance management practices involve, in part, reverse engineering protocol actions and dependencies.