1. Field of the Invention
The present invention relates to systolic arrays and, in particular, to a reassembly assist for a systolic array of a router used in a computer network.
2. Background Information
A class of data networking equipment, referred to as aggregation or edge routers, has emerged that aggregates thousands of physical or logical links from end users onto one or more higher speed “backbone” links (such as OC-12, Gigabit Ethernet, OC-48 and higher) of a computer network. As the name implies, these routers reside at the edge of the network and are “gatekeepers” for packets transmitted over the high-speed core of the network. As such, they are required to perform a list of advanced “high-touch” features on the packets to protect the network from unauthorized use and to deal with a wide range of unique link interfaces, protocols and encapsulation methods that the end user links require. In order to provide these complex features and to provide flexibility for newly defined features, these routers are normally implemented using specialized, programmable processors.
The sheer processing power and memory bandwidth required to access data structures (e.g., tables) in order to process packets dictates the use of multiple processors within an edge router. A common approach is to organize these processors into one or more parallel one-dimensional (1-D) systolic arrays wherein each array is assigned an incoming packet to process. Each processor within a single systolic array “pipeline” is assigned a piece of the task of processing the packet and therefore only needs access to a memory associated with that task. However, the corresponding processors in other parallel 1-D arrays that are performing the same tasks may also share that same memory. Furthermore, each processor has a limited amount of time (allotted phase) to process its packet without impacting the flow of packets through the system and negatively impacting packet throughput.
Aggregation routers of this class often provide support for multi-link protocols. A multi-link protocol is used to aggregate multiple physical links into a single logical link. Multi-link protocols enable parallel lower speed links to be combined such that they act together as one higher speed link. The combining of links is called bundling and the group of multiple physical links that are combined into a single logical link is called a bundle. Bundling provides the benefits of the combined bandwidth of the aggregated physical links, but with substantially less latency of any one individual physical link. In addition, greater service resilience is provided because the bundle continues to function even when individual links within the bundle fail. Two examples of multi-link protocols is are the multi-link point-to-point protocol (MLPPP) defined in RFC 1990, The PPP Multi-link Protocol, and the multi-link frame relay protocol (MFR) defined in FRF.15, the End-to-End Multi-link Frame Relay Implementation Agreement. 
In a multi-link protocol, a packet waiting to be sent on a bundle is often sent on the next available physical link in the bundle. If the packet is large, the packet is typically broken up into a series of fragments (i.e., data fragments) then sent in parallel over different physical links in the bundle. On the receiving end, the fragments are reassembled to form the original packet. By fragmenting packets and sending them in this manner, greater bandwidth can be realized as large packets are sent in parallel and capable of utilizing the capacity of the bundle rather than an individual link. However, one drawback with this method is that fragments may arrive out of order on the receiving end, thereby complicating the process of reassembling the packet.
The reassembly process typically involves tracking fragments and temporarily storing them as necessary until all the fragments have been received. When all the fragments have been received, the packet is typically reassembled by moving each fragment into a packet buffer at a displacement in the buffer that correlates to the displacement of the fragment in the original packet.
The amount of time required to reassemble the packet varies based on the number of fragments that need to be reassembled. In a high throughput systolic array configuration, this variation in time is unacceptable because depending on the number of fragments that need to be reassembled, the processor may not be able to complete the reassembly process within its allotted time phase, thereby possibly stalling other processors in the array. Even if the time phase for the processor was extended to accommodate the worst case largest-sized packet, doing so introduces inefficiencies and wastes valuable processor resources. In addition, having to reassemble a packet when the last packet is received means the processor has to dedicate a greater amount of time to processing the last fragment than it dedicates to processing earlier fragments. Once again this could be remedied by extending the processor phase to accommodate the largest-sized packet, however, as indicated before, doing so wastes valuable resources. It is therefore highly desirable to have a mechanism that makes the packet reassembly process deterministic.