Networks, from telephone to the Internet, have continued to expand as businesses have embraced their use in communicating internally among their employees or externally with, or to provide services to, consumers or users. Network technologies, in response, have continued to expand to make network operation more efficient and to allow different networks the means to communicate with and pass information between one another. In one aspect of network technology development, the International Organization for Standardization have developed the Open Source Interface (OSI) architectural model that provides a standardize means for communicating within a network entity or communicating between network entities. The OSI stack is divided into seven layers, which are well-known and comprise a physical layer representing the physical (hardware) and electrical signal implementation, (Layer 1), a data link layer representing the formats used for transmitting data over the network (Layer 2) and a network layer representing the address assignments and packet forwarding methods (Layer 3). The remaining layers, i.e., the upper layers, represent connection and recovery methods, security and authentication methods, representation formations and data interpretation which may include encryption or decryption information.
Network connectivity is enabled not only by the physical connectivity between the devices (layer 1), but also by the routing protocols running on them. Typically routing is an OSI Layer 3 function. Hence, it is not enough to have physical or link (Layer 1 or Layer 2) connectivity between the devices to ensure that devices can properly exchange information. Data packets will only flow via paths defined by the routing protocol, even if the physical connectivity supplies the necessary physical paths. In addition, routing protocol failures may prevent connectivity between nodes, even if the nodes are physically connected.
Network protocol management present significant management challenges as they include a large number of configurable distributed entities that are needed to accomplish consistent operations. Small typographical errors in a single router configuration can have wide-ranging effects. Conventionally, administrators must educate themselves on the proper configuration and operations of installed protocols and their use in networked systems. Backed with knowledge of the installed protocols, administrators, typically, must manually adapt or customize the network configurations and monitor operations to assure proper functionality and correct operation. When hardware and/or software elements (e.g., devices, cards, drivers, applications, new protocol entities, etc.) are added to, removed from, or reconfigured in the network, the changed network condition requires the adjustment of the associated protocol entities. The challenge to configure and manage the network is exacerbated when the network size grows to hundreds and even thousands of elements or devices. Continued adjustment of the protocol entities requires significant skill, effort and time on the part of the network administrator. Even with diligent effort on the part of a skilled network administrator, an error introduced during the network setup or subsequent reconfiguration or adjustment may render inoperative portions of, or even the entire, network for unacceptable lengths of time.
In addition, when errors in the network occur, the error may be caused by an error in the protocol configuration (i.e., a misconfiguration) or by failures in the underlying hardware or software. In the former case, the alarms, associated with the error, are generated in the network layer, whereas in the latter case, the alarms, associated with the error, are generated in a lower layer and propagated through the network layer. For example, protocol failures may impact Service Level Agreements when protocol entities fail to communicate or devices fail to communicate with each other through the proper exchange of routing information or establish new and/or alternate paths. However, a physical connectivity failure may also indicate one or more protocol failures. In this case, a failure of a node that is responsible for the exchange of routing information between two networks will generate failure alarms for the failed physical node and an failure in the associated protocol.
To determine misconfigurations or to distinguish misconfigurations from physical or other logical failures, management solutions must have the ability to analyze configurations of all entities participating in the protocol, with an understanding of the different roles these entities (physical and logical) play in the protocol itself. As protocol events or alarms may be due to events (causing events) that happened in other components or other realms or domains of a system there is a need to correlate events in the other realms with events in the routing protocol realm. Hence, it is important that a comprehensive analysis of protocol configuration and operation be performed as routing protocol failures cannot be analyzed in isolation and the determination of the reason for the failure must be correlated with Layer 1 and Layer 2 failures in order to reach the root problem underlying the observed or detected alarms (symptoms).
Hence, there is a need in the industry for a method and apparatus that can automate the management of the configuration and operation of the network layer and further determine the root-cause of alarms generated at different levels of the network.