1. Field of the Invention
The present invention relates to data communications, and, more particularly, to a method and apparatus for improved failure recovery in a network element.
2. Description of the Related Art
As more and more information is transferred over today's networks, businesses have come to rely heavily on their network infrastructure in providing their customers with timely service and information. Failures in such network infrastructures can be costly both in terms of lost revenue and idled employees. Thus, high reliability systems are becoming increasingly attractive to users of networking equipment.
Moreover, the need for routers, switches and other such network elements to provide ever-increasing packet processing rates with such high reliability, while maintaining a compact form factor, typically mandates the use of highly specialized ASICs (Application Specific Integrated Circuits) operating at very high frequencies, which consequently dissipate large amounts of heat. Such considerations further complicate the issue of appropriate chassis design. When one adds the requirement for high-availability (redundancy) to the above requirements, the problem is further complicated.
One example of a switch architecture provides port cards, catering to different types of physical interfaces, that feed their traffic into forwarding engines. Such forwarding engines support a form of distributed forwarding using specialized hardware that typically employ one or more ASICs. These forwarding engines are interconnected by means of a switching fabric. A routing processor (or multiple routing processors, for redundancy purposes) is also typically provided to manage exception handling and other tasks that cannot be managed by the forwarding engines.
The port cards are separated from the forwarding engine cards to allow multiple communications technologies to be accommodated using the same forwarding engine infrastructure. By designing a network element such that the network element's port card and forwarding engine functionality are separated, varying port card architectures can be used to support such multiple communications technologies (e.g., protocols, hardware interfaces, and so on), while employing the same basic forwarding engine architecture. It is therefore beneficial to avoid the need for multiple versions of forwarding engines in order to support different line interfaces. When one considers redundancy issues, however, such an approach involves the additional problem of handling the failure of a forwarding engine that provides forwarding functionality for a port card.
However, a balance must be struck between redundancy, and the physical and commercial restraints placed on such designs. As noted, while high-availability is desirable, the cost of such availability must be balanced against the cost of such a design, as well as the physical size and the thermal energy that must be dissipated. As redundant elements are added to a given architecture, that architectures availability (reliability) improves, but its cost also rises, as does its size and the thermal energy produced thereby. Thus, the amount of redundancy should be minimized, while still providing the requisite level of availability.
Such reliability can be viewed both in terms of the availability of the network element, and the effect of a failure and the restoration performed in response thereto. As noted, such a system should provide reliable service under the given conditions. In the event of a failure, such a system should also provide continuity of service, to the maximum degree possible (or, at least, to a commercially acceptable degree). In practice, this means that, should a failure occur, the data streams carried by the network element should experience minimal disruption, if indeed they experience any at all.
As is apparent from the preceding discussion, while providing high availability is certainly possible, providing such reliability in a cost-effective and commercially reasonable manner is challenging. As with most engineering problems, a solution that is not commercially reasonable, whether as a result of cost, complexity, physical requirements or the like, offers no real benefit to users (or to manufacturers). What is therefore needed is a way to provide for the reliable conveyance of data streams in an economically reasonable fashion. Moreover, such conveyance should be provided in a manner that, in the face of failures within the network elements carrying those data streams, causes minimal disruption to the services thus supported.