There has been explosive growth in Internet traffic due to the increased number of Internet users, various service demands from those users, the implementation of new services, such as voice-over-IP (VoIP) or streaming applications, and the development of mobile Internet. Conventional routers, which act as relaying nodes connected to subnetworks or other routers, have accomplished their roles well, in situations in which the time required to process packets, determine their destinations, and forward the packets to the destinations is usually smaller than the transmission time on network paths. More recently, however, the packet transmission capabilities of high-bandwidth network paths and the increases in Internet traffic have combined to outpace the processing capacities of conventional routers. Thus, routers are increasingly blamed for major bottlenecks in the Internet.
Early routers were implemented on a computer host so that the CPU of the host performed all tasks, such as packet forwarding via a shared bus and routing table computation. This plain architecture proved to be inefficient, due to the concentrated overhead of the CPU and the existence of congestion on the bus. As a result, router vendors developed distributed router architectures that provide efficient packet processing compared to a centralized architecture. In distributed router architectures, many of the functions previously performed by the centralized CPU are distributed to the line cards and a high-speed crossbar switch replaces the shared bus.
Conventional IP routers have a single processor that handles routing updates for all of router interfaces. Conventional high-end routers may have multiple processors, but still centralize the routing protocols in a single entity called a route server. Both of these technologies have scalability problems. As the number of interfaces increases, the rate of route updates increases. Eventually, the processing capability of the processor performing the route updates is exceeded.
Samsung Telecommunications America™ has defined a distributed architecture for the Galaxy™ IP router, where multiple routing engines distribute the workload of managing the interfaces and maintaining the routes. This requires that the management and protocol workload be distributed among various processors. In the Galaxy™ IP router, the workflow is distributed through a method in which each processor receives its work on its own input queue, completes its part of the routing problem, then passes the work to another processor for additional processing.
However, the previously proposed methods of workflow-based distribution applied to only two processors in a point-to-point link and used a push method, whereby the sending processor pushed the data to the receiving processor. However, current configurations of massively parallel routers, such as the Galaxy™ IP router, implement at least five processors in each routing node. The increase to more than two processors is a major change that requires many other factors to be considered.
Prior art routers do not scale easily to multiple processors. These routers do not include mechanisms to avoid collisions between multiple communication transactions among multiple processors and multiple processes. The prior art routers require an input queue for each data producer. This causes memory requirements to grow to unreasonably high levels. It is unacceptable to rebuild the code just to add more components to the system, since this requires an interruption of user data traffic to start the new load.
Therefore, there is a need in the art for an improved massively parallel router. In particular, there is a need for a massively parallel, distributed architecture router that implements multiple processors in each routing node and implements a mechanism to avoid collisions between multiple communication transactions among multiple processors and multiple processes. More particularly, there is a need for a massively parallel, distributed architecture router that implements multiple processors in each routing node without requiring an input queue for each data producer.