The present invention relates to intermediate nodes of a communications network and, in particular, to the infrastructure of an intermediate node, such as an aggregation router, used in a communications network, such as a computer network.
A computer network is a geographically distributed collection of interconnected communication links and segments for transporting data between nodes, such as computers. Many types of network segments are available, with the types ranging from local area networks (LAN) to wide area networks (WAN). For example, the LAN may typically connect personal computers and workstations over dedicated, private communications links, whereas the WAN may connect large numbers of nodes over long-distance communications links, such as common carrier telephone lines. The Internet is an example of a WAN that connects disparate networks throughout the world, providing global communication between nodes on various networks. The nodes typically communicate over the network by exchanging discrete frames, cells or packets of data according to predefined protocols. In this context, a protocol consists of a set of rules defining how the nodes interact with each other.
Computer networks may be further interconnected by an intermediate network node, such as a switch or router, having a plurality of ports that may be coupled to the networks. To interconnect dispersed computer networks and/or provide Internet connectivity, many organizations rely on the infrastructure and facilities of Internet Service Providers (ISPs). ISPs typically own one or more backbone networks that are configured to provide high-speed connection to the Internet. To interconnect private networks that are geographically diverse, an organization may subscribe to one or more ISPs and couple each of its private networks to the ISP""s equipment. Here, the router may be utilized to interconnect a plurality of private networks or subscribers to an IP xe2x80x9cbackbonexe2x80x9d network. Routers typically operate at the network layer of a communications protocol stack, such as the internetwork layer of the Transmission Control Protocol/Internet Protocol (TCP/IP) communications architecture.
Simple networks may be constructed using general-purpose routers interconnected by links owned or leased by ISPs. As networks become more complex with greater numbers of elements, additional structure may be required. In a complex network, structure can be imposed on routers by assigning specific jobs to particular routers. A common approach for ISP networks is to divide assignments among access routers and backbone routers. An access router provides individual subscribers access to the network by way of large numbers of relatively low-speed ports connected to the subscribers. Backbone routers, on the other hand, provide transports to Internet backbones and are configured to provide high forwarding rates on fast interfaces. ISPs may impose further physical structure on their networks by organizing them into points of presence (POP). An ISP network usually consists of a number of POPs, each of which comprises a physical location wherein a set of access and backbone routers is located.
As Internet traffic increases, the demand for access routers to handle increased density and backbone routers to handle greater throughput becomes more important. In this context, increased density denotes a greater number of subscriber ports that can be terminated on a single router. Such requirements can be met most efficiently with platforms designed for specific applications. An example of such a specifically designed platform is an aggregation router. The aggregation router is an access router configured to provide high quality of service and guaranteed bandwidth for both data and voice traffic destined for the Internet. The aggregation router also provides a high degree of security for such traffic. These functions are considered xe2x80x9chigh-touchxe2x80x9d features that necessitate substantial processing of the traffic by the router. More notably, the aggregation router is configured to accommodate increased density by aggregating a large number of leased lines from ISP subscribers onto a few trunk lines coupled to an Internet backbone.
The infrastructure of a typical router comprises functional components organized as a control plane and a data plane. The control plane includes the functional components needed to manage the traffic forwarding features of the router. These features include routing protocols, configuration information and other similar functions that determine the destinations of data packets based on information other than that contained within the packets. The data plane, on the other hand, includes functional components needed to perform forwarding operations for the packets.
For a single processor router, the control and data planes are typically implemented within the single processor. However, for some high performance routers, these planes are implemented within separate devices of the intermediate node. For example, the control plane may be implemented in a supervisor processor, such as a route processor, whereas the data plane may be implemented within a hardware-assist device, such as a co-processor or forwarding processor. In other words, the data plane is typically implemented in a specialized piece of hardware that is separate from the hardware that implements the control plane.
For implementations that require high availability, the data plane tends to be generally simple in terms of its organization and functions of the hardware and software. That is, the forwarding processor may be configured to operate reliably by reducing the complexity of its functional components. In contrast, the control plane tends to be more complex in terms of the quality and quantity of software operating on the supervisor processor. Failures are thus more likely to occur in the supervisor processor when executing such complicated code. In order to ensure high availability in an intermediate network node, it is desirable to configure the node such that if a failure arises with the control plane that requires restarting and reloading of software executing on the supervisor processor, the data plane continues to operate correctly. An example of such a high availability intermediate node is an asynchronous transfer mode (ATM) switch having a relatively simple switch fabric used to forward ATM cells from its input interfaces to output interfaces.
However, high-performance routers have evolved to where their data planes have become more complex in terms of software executing on their forwarding processors. This has increased the possibility of fatal errors arising in the forwarding processors that, in turn, halt forwarding of data traffic in the data planes. In a situation where a fatal error is detected in the data plane hardware or software, thereby requiring a reset and restart of the forwarding processor, the conventional approach is to restart the entire router including a restart of the control plane. Yet restarting of the entire router takes a relatively long period of time, e.g., on the order of minutes.
Specifically, restarting of the control plane requires reloading of an operating system executing on the supervisor processor, as well as reinitializing that operating system to a point where it acquires its necessary state. For example, re-initialization of the operating system includes acquiring lost dynamic state, such as routing protocol state information. A control plane restart is thus xe2x80x9cvisiblexe2x80x9d to neighboring routers as a topology change in the network that requires those neighbors having xe2x80x9cknowledgexe2x80x9d of the network to re-compute their routing databases when the restarted router is back online. In addition, the router must re-establish connections with its neighbors and exchange routing databases with those neighbors so as to xe2x80x9cconvergexe2x80x9d its routing database. As noted, such activity consumes an excessive amount of time and the present invention is directed to a technique that addresses this problem.
The present invention comprises a system and technique for restarting a data plane of an intermediate node, such as an aggregation router, of a computer network without changing the state of a control plane in the router. The control plane includes a supervisor processor, such as a route processor, configured to manage traffic forwarding operations of the node. To that end, the route processor maintains a current state of the control plane pertaining to, e.g., routing protocols and interface states of line cards within the router. The aggregation router further comprises a data plane that includes hardware components, such as a forwarding engine, configured to perform forwarding operations for data forwarded by the router.
According to an aspect of the inventive technique, when the route processor detects a fatal error in the data plane, e.g., via an exception condition reported by data plane hardware, it restarts only the data plane without changing the state of the control plane. That is, the route processor resets the hardware components of the data plane, reloads software into those appropriate components and then resynchronizes the forwarding engine with state information stored in the control plane that is relevant to the data plane, e.g., the interface states of the line cards.
According to another aspect of the inventive technique, independent software modules, or clients, logically interact with xe2x80x9cresetxe2x80x9d software code of an operating system so that only the relevant portions of the code that control the data plane are executed. In response to detection of a fatal error by the control plane, driver software executing on the route processor notifies these clients, e.g., via registered call back functions, about the error. An exception handler routine is then invoked to resolve the error. Meanwhile, the clients terminate further attempts to access the data plane hardware while it is in an exception state.
After the error condition is resolved, the route processor resets the data plane hardware, reloads the software (i.e., micro-code) executing on the forwarding engine and resynchronizes the state stored on the control plane with relevant state needed by the data plane. The clients are then notified that the data plane hardware may once again be accessed and those clients proceed to download their specific configuration information into the forwarding engine. After the data plane is restarted, data traffic begins to flow through the forwarding engine.
An advantage of the data plane restart invention is that state information maintained on the control plane is preserved. Thus, resetting and restarting of the data plane can be performed in a few seconds rather than several minutes needed to reacquire the state information in order to restart the entire aggregation router, including the control plane. In addition, the router is still considered an active intermediate node to its neighboring routers in the network even though the data traffic forwarded to the router does does not flow through the data plane. This aspect of the invention obviates the need to recompute and re-converge forwarding databases in the network.