Within a telecommunications network both data and control information (i.e., external control information) is passed between network devices in the network. The external control information supports a variety of administrative tasks, for example, learning and calculating network topology for routing purposes, setting up connections between two or more devices and sending and responding to error messages. External control information may also include control information from a network/element management system (NMS) to a network device, for example, for provisioning services and retrieving billing and statistical data. Within a network device including distributed processors, in addition to external control information, a considerable amount of internal control information is transferred between the distributed processors such that the network device with the distributed architecture appears to other network devices as one entity.
Transmitting the internal and external control information over the data path is referred to as in-band management. Typically, the control information is pulled off the data path by a processing function on a line card and sent over a switch fabric within the network device to a processor card within the network device. Thus, a portion of the network device's data path bandwidth is consumed in the transfer of control information.
In addition to bandwidth consumption, during high traffic periods, congestion control mechanisms may cause data and/or control information to be dropped. Dropping control information may cause one or more network devices to fail and may bring down the entire network. For example, if “keepalive” control information for a network device is dropped, then a timeout may occur and the other network devices may assume that that network device is down. This will cause the other network devices to reroute traffic around the “failed” network device. Rerouting traffic generates a considerable amount of router updates and status messages in an already congested/collapsing network. In addition, the rerouted traffic may overload one or more other network devices causing them to go down or drop other keepalive messages again causing a flurry of routing updates and status messages. Moreover, the network device that was assumed to have gone down may generate “I'm back” messages causing more routing updates and status messages. Thus, the chaos spreads in widening circles outward through the network causing the network to quickly destabilize and collapse.
Increasing the priority of control information may prevent the control information from being dropped. During storms of control information, however, data traffic may be starved.
To address the issues of in-band management, a network device having a distributed architecture may include an internal out-of-band control plane. Each of the distributed processors is connected to the out-of-band control plane, and the processors use the out-of-band control plane to transmit control information. For example, the out-of-band control plane may be an internal I2C bus, PCI bus, Ethernet hub or proprietary bus. Since these control planes include a shared media, the processors connected to them must share the available bandwidth. For example, an Ethernet hub may provide a maximum bandwidth of 100 Mb/sec, which is shared by each of the connected processors. Thus, the larger the number of distributed processors in a network device the less bandwidth per processor is available. As a result, the scalability of these control planes is limited.
In addition, adding an internal control plane decreases the network device's reliability and availability. Reliability is decreased when the new components for the control plane are added—that is, the more components a network device has, the higher the likelihood of a failure of one or more components. If the network device fails due to the lower reliability, then the network device availability is reduced.