1. Field of the Invention
The present invention relates to computer networking and in particular to maintaining entity order in a system containing multiple entities.
2. Background Information
Computer architecture generally defines the functional operation, including the flow of information and control, among individual hardware units of a computer. One such hardware unit is the processor or processing engine, which contains arithmetic and logic processing circuits organized as a set of data paths. In some implementations, the data path circuits may be configured as a central processing unit (CPU) having operations that are defined by a set of instructions. The instructions are typically stored in an instruction memory and specify a set of hardware functions that are available on the CPU.
A high-performance computer may be realized by using a number of CPUs or processors to perform certain tasks in parallel. For a purely parallel multiprocessor architecture, each processor may have shared or private access to resources, such as an external memory coupled to the processors. Access to the external memory is generally handled by a memory controller, which accepts requests from the various processors to access the external memory and processes them in an order that typically is controlled by the memory controller. Certain complex multiprocessor systems may employ many memory controllers where each controller is attached to a separate external memory subsystem.
One place where a parallel multiprocessor architecture may be advantageously employed involves the area of data communications and, in particular, the forwarding engine for an intermediate network station or node, such as a router or switch. An intermediate node interconnects communication links and sub-networks of a computer network through a series of ports to enable the exchange of data between two or more software entities executing on hardware platforms, such as end nodes. The nodes typically communicate by exchanging discrete packets or frames of data according to predefined protocols, such as the Transmission Control Protocol/Internet Protocol (TCP/IP) or the Internetwork Packet Exchange (IPX) protocol. The forwarding engine is often used by the intermediate node to process packets acquired on the various ports in accordance with various predefined protocols. This processing may include placing a packet in a packet memory where the forwarding engine may access data associated with the packet and perform various functions on the data such as, modifying the contents of the packet.
In some intermediate nodes, the multiprocessor forwarding engine is organized as a systolic array comprising “m” rows and “n” columns of entities, such as processors or threads. Here, the entities of each row may be configured to process packets in a pipeline fashion, wherein the entities of each column in the row acts as a stage in the pipeline and performs a particular function on the packets. For example, an 8×2 systolic array of processors comprises 16 processors organized as 8 rows containing 2 columns per row wherein the column processors of each row comprise a 2-stage pipeline.
Usually, packets are processed by the systolic array in a manner where a packet is assigned to a particular row of entities and each entity in a column is configured to perform a function on the packet in a manner as described above. For example, in the 8×2 array described above, the intermediate node acquires a packet and assigns the packet to a particular row of processors in the array. The processor in the first column of the row may be configured to apply a destination address contained in the packet to a look-up table to determine the destination of the packet. The processor in the second column may be configured to place the packet on an output queue associated with the destination.
In a typical systolic array configuration, each entity in a particular column is configured to execute the same code within a fixed amount of time but with a shifted phase. As packets are acquired, they are placed in a shared resource, such as an external memory, and assigned to the next available row of entities, as described above. In this configuration, packets tend to be processed by the intermediate node on a first-in first-out basis such that packets that arrive ahead of later packets exit the forwarding engine ahead of the later packets. However, due to loss of entity order caused by various events associated with accessing the shared resource, it may be possible for packets that arrive later to exit ahead of packets that arrived earlier.
For example in a systolic array configuration comprising processors, the processors of a particular row processing an earlier acquired packet may stall due to various memory events associated with the shared resource, such as memory refresh cycles or being denied access to locked memory locations. Thus, the time spent processing the earlier packet may take longer than the time spent processing a later acquired packet processed by a different row of processors and, consequently, the later acquired packet may end up exiting the forwarding engine ahead of the earlier acquired packet.
One way to maintain entity order and consequently packet processing order is to employ a synchronization mechanism that synchronizes the entities in the systolic array at certain points during their processing of packets. A prior art synchronization mechanism that may be used involves a special instruction called a “boundary synchronize” (BSYNC) instruction. The BSYNC instruction causes the entities in a particular column to wait (stall) until all the processors in the column have executed the instruction. In a typical arrangement, code executed by the column of entities contains BSYNC instructions at various strategic points in the code. When an entity, such as a processor, executes the BSYNC instruction, the processor's hardware causes the processor to stall until all the other processors in the same column have executed their BSYNC instructions at which time all the processors continue execution with their next instruction. The BSYNC instruction acts to synchronize the entities at certain code boundaries, and can be used to prevent entities that are processing later acquired packets from “getting ahead” of entities processing earlier acquired packets.
While the above-described technique may be effective at ensuring that entity and packet-processing order is maintained, the technique has its drawbacks. For example, each entity must execute an instruction in order to synchronize and maintain order. This wastes valuable time that may be better utilized processing packets. Moreover, an entity may become stalled while waiting for slower entities to execute the synchronize instruction. This too wastes valuable time and may act to further diminish the entity's capacity to process packets.