1. Field of the Invention
This invention relates generally to implementation of fault-tolerant behavior network processor-based devices and networking systems, more specifically to a system and methodology for maintaining the disruption-free operation of the forwarding plane in the context of a faltering control plane.
2. Discussion of the Prior Art
In today's networked world, bandwidth is a critical resource. Increasing network traffic, driven by the Internet and other emerging applications, is straining the capacity of network infrastructures.
It is increasingly evident that networking devices are playing pivotal roles in mission-critical applications. However, network connectivity is taken for granted, and disruption in network connectivity services has severe implications on productivity. Consequently, the networking devices have to be very robust.
It is further the case that the networking devices are also becoming increasingly quite complex due to: (1) the number of protocols to be supported are increasing; (2) the existing protocols are increasing in complexity to keep up with the rapid change of user applications; (3) the increase in bandwidth requirement; and, (4) the requirement to support all the complex features at wire speed. The burden on the manufacturers of networking devices is thus to build highly complex systems that are very robust.
More importantly, to be profitable, time to market is critical. That is, these systems need to be as quickly as possible to capture an early market share. To meet this burden, manufacturers resort to distributed system architecture: build/assemble the system from several proven well-tested components, irregardless of whether these components might have been acquired from different vendors with different price/performance characteristics. Though these components perform very well on their own individually, their combined system behavior might not be satisfactory. Temporary failure of one of the components could have detrimental cascading effect on other components and crash the system down.
Thus, manufacturers are looking for components that tolerate temporary failure of other components and continue to offer reasonable service.
One networking device, referred to herein as a network processor or “NP”, has been defined as a programmable communications integrated circuit capable of performing one or more of the following functions: 1) Packet classification—identifying a packet based on known characteristics, such as address or protocol; 2) Packet modification—modifying the packet to comply with IP, ATM, or other protocols (for example, updating the time-to-live field in the header for IP; 3) Queue/policy management—reflects the design strategy for packet queuing, de-queuing, and scheduling of packets for specific applications; and, 4) Packet forwarding—transmission and receipt of data over the switch fabric and forwarding or routing the packet to the appropriate address.
NP-based networking devices are built from several components and in general have the architecture as depicted in FIG. 1. In the example networking system architecture 10, there are illustrated “n” Control Point (CP) processors 25 each of which may comprise a general purpose processor (GPP) having a physical or logical association with one or more of the Network Processors 12 in the system for enabling the customization and configuration of the Network Processor (NP) devices so that they may handle the forwarding of data packets and frames. As shown in FIG. 1, the control points 25 are connected to the network processor device 12 via a switch fabric 15. One NP device 12 is shown as supporting a number of external LAN or WAN interface ports 20 through which it receives and forwards data packets. It should be understood that the generic networking system architecture 10 depicted in FIG. 1 is for exemplary purposes and that other configurations are possible.
The generic networking system architecture 10 comprises two major software components: 1) the control point code base running on the GPP, and, programmable hardware-assist processors' picocode executing in each of the network processors. These two software components are responsible for initializing the system, maintaining the forwarding paths, and managing the system. From a software view, the system is distributed. The GPP (control point processor 25) and each picoprocessor run in parallel, with the CP communicating with each picoprocessor using a predefined application program interface (API) and control protocol. For purposes of description, as shown in FIG. 1, there are typically “m” protocols/software applications Al, . . . Ak, . . . Am, that run in the “n” control point processors CPl, . . . ,CPn 25. Typically, the NP device 12 receives packets via the data interfaces 20 which packets may belong to two categories: 1) protocol/application control packets; or, 2) data packets. If a control packet is received, then the NP device 12 will analyze the contents of the frame and may determine that this packet may be of interest to some application/protocol Aj running on control point CPk. Consequently, the NP device will forward the received control packet to the CPk. The applications/protocols will process these control packets, possibly store some information in the storage device available in the CP processor itself, and also send messages to the NP to effect addition, deletion, and/or modification of entries in the forwarding table 18 which entries represent the topology of the network as viewed by the networking system. This is herein referred to as the control-plane operation of the networking device. If a data packet is received, then the NP device 12 will analyze the contents of the frame, consult the forwarding table 18, determine the outgoing data interface/port 20 and forward the frame via that interface. This is referred to as the data-plane operation of the networking device. Thus, in a NP-based networking system, control-plane operations are performed by the control-point processor components whereas the data-plane operations are delegated to NP components. Further details regarding the general flow of a packet or frame received at an NP device may be found in commonly-owned, co-pending U.S. patent application Ser. No. 09/384,691 filed Aug. 27, 1999 and entitled “NETWORK PROCESSOR PROCESSING COMPLEX AND METHODS”, the whole contents and disclosure of which is incorporated by reference as if fully set forth herein.
Traditionally, the relationship between the control plane and data plane is that of master and slave with the control plane acting as the master as it is responsible for populating and maintaining the forwarding table. If the NP fails and restarts, then the applications/protocols will populate the forwarding table once again, using the information that is stored in the CP processor 25.
Currently, as shown in FIG. 2, a control point CP-based application 26, for example, that carries out Open Shortest Path First (OSPF) forwarding operations, is responsible for loading and updating new entries of the forwarding table 18 for the NP device 12 via API 30. Thus, packet forwarding tables 18 are updated using the OSPF protocol, for example, which enables routers to understand the internal network architecture, i.e., within the autonomous network. As known, OSPF calculates the shortest path from an IP Source Address (SA) to IP Destination Address (DA). For example, when a subnet is moved/deleted within a network, OSPF will update the new shortest path to that changed/deleted subnet if required (i.e., if associated next hop changes). This requires forwarding tables 18 in all NP devices to be updated which entails deleting table entries and inserting new fields.
It should be understood that many CP-based applications running CP1–CPn may be downloading and updating new entries of the forwarding table 18 for the NP devices. Specifically, each respective control point CP-based application CP1–CPn gains knowledge of changing network configurations and generates/calculates respective protocol specific information for populating forwarding table entries of NP devices 12.
Each CP-based application particularly maintains a protocol specific routing table 28 including the packet routing information and updates its table with new packet routing information as it is generated, e.g. after a CP application failure, or becomes available. Via an application programming interface, this information is downloaded to one or more NP devices 12 so that entries in the NP forwarding table 18 may be updated.
Currently, there exists the problem of handling the failure and restarting of applications/protocols that run on the CP components. When these applications fail, the possibility exists of that most of the information that is stored in the control point may be lost. Traditionally, when applications/protocols restart they purge the forwarding table and both the NP and CP applications start reconstructing the information synchronously. That is, whenever the control plane restarts and the forwarding plane is also forced to restart in order to simplify the task of synchronizing information that is maintained in the NP and CP devices. The consequence of restarting the forwarding plane results in the disruptment of network connectivity.
It would be highly desirable to provide a system and method that provides for a smooth transition when updating entries of packet forwarding tables by CP applications when the CP application fails, and particularly, one that avoids the restarting of the data forwarding plane from scratch when the control point application restarts.
It would further be highly desirable to provide a system and method that provides for a smooth transition when updating entries of packet forwarding tables by CP by enabling the “aging out”, i.e., deletion of the entries inserted by an old CP application instance.