The explosive growth of Internet traffic has been caused by the increased number of Internet users, various service demands from those users, the implementation of new services, such as voice-over-IP (VoIP) or streaming applications, and the development of mobile Internet. Conventional routers, which act as relaying nodes connected to subnetworks or other routers, have accomplished their roles well, in situations in which the time required to process packets, determine their destinations, and forward the packets to the destinations is usually smaller than the transmission time on network paths. More recently, however, the packet transmission capabilities of high-bandwidth network paths and the increases in Internet traffic have combined to outpace the processing capacities of conventional routers. Thus, routers are increasingly blamed for major bottlenecks in the Internet.
Early routers were implemented on a computer host so that the CPU of the host performed all managerial tasks, such as packet forwarding via a shared bus and routing table computation. This plain architecture proved to be inefficient, due to the concentrated overhead of the CPU and the existence of congestion on the bus. As a result, router vendors developed distributed router architectures that provide efficient packet processing compared to a centralized architecture. In a distributed router architecture, many of the functions previously performed by the centralized CPU are distributed to the line cards and the shared bus is replaced by a high-speed crossbar switch.
FIG. 1 illustrates distributed router 100 according to an exemplary embodiment of the prior art. Distributed router 100 interfaces with different types of networks, including optical networks (OC-192), asynchronous transfer mode (ATM) networks, and Gigabit Ethernet, among others. Distributed router 100 comprises line card modules (LCMS) 111–113, switch fabric 130, routing processor 140, and line card modules (LCMS) 151–153. LCM 111, LCM 112, and LCM 113 contain forwarding table (FT) 121, forwarding table (FT) 122, and forwarding table (FT) 123, respectively. Similarly, LCM 151, LCM 152, and LCM 153 contain forwarding table (FT) 161, forwarding table (FT) 162, and forwarding table (FT) 163, respectively.
Packets coming from adjacent router(s) or subnetworks are received by line card modules 111–113 and line card modules 151–153 and sent to switch fabric 140. Switch fabric 130 switches packets coming from or going to line card modules 111–113 and 151–153 and plays an essential role in relaying packets.
Routing processor 140 builds routing table 141 and maintains the current status of routing table 141 by updating changed routes immediately. Routing processor 140 maintains routing table 141 by running a routing protocol, such as Routing Information Protocol (RIP), Open Shortest Path First (OSPF), or Border Gateway Protocol (BGP). Forwarding tables 121–123 and 161–163 support an efficient lookup in each line card and are downloaded from routing table 141 of routing processor 140. If an incoming packet from a line card module cannot find its destination path from the forwarding table, the corresponding packet may be passed through switch fabric 130 toward a pre-defined default route, or may be silently discarded at the line card.
The main reason for router manufacturers to favor distributed architecture is the simplicity of using a centralized processor to manage one routing table in a consistent way. On the other hand, although the separation of routing and forwarding functions enables high-speed packet processing, the introduction of QoS-capable routing service and the route delays caused by network instability demand even greater packet processing capacity, thereby resulting in additional overhead for the routing processor or instability in the router itself.
A large number of small routers can operate in concert (i.e., in parallel), if an efficient set of interoperability rules are established. The industry has avoided this coordination problem by using a single routing server to handle the routing problems. Therefore, it bounds both the scale of the router and its maximum performance to the scale of available microprocessor processing capacity.
Data packets that are inbound to a router may be switched through to one or more switch fabrics via two or more uplink paths within the input interface. The purpose of having multiple uplinks and switch fabric modules is to perform traffic load balancing and to provide redundant paths in case of link or switch fabric module failures. For example, packets received by LCM 111 in FIG. 1 may be transmitted to switch fabric 130 via one of N uplink paths. The actual uplink path may be selected by forwarding table 121. The selected path may be chosen, for example, by a round robin load balancing scheme in which the uplinks are sequentially selected for successive packets. However, by its very nature, such as scheme may potentially alter the order in which packets from the same source are received at the output interface (e.g., LCM 151–LCM 153). This may lead to performance and conformance related problems. Even though Internet protocol (IP) does not assume any packet ordering, packets arriving out-of-order at the destination may create throughput problems, particularly for TCP/IP based applications. The problem is worsened if packet size is not taken into consideration, since packet size deviation can affect the effectiveness of the load balancing scheme.
Therefore, there is a need in the art for an improved massively parallel router. In particular, there is a need for a massively parallel router having a distributed architecture that implements an effective load balancing scheme. More particularly, there is a need for a distributed architecture router that implements a load balancing scheme that minimizes out-of-order packet arrival and that minimizes the impact of packet size deviation.