A systolic array provides a common approach for increasing processing capacity of a computer system when a problem can be partitioned into discrete units of works. In the case of a one dimensional systolic array comprising a single “row” of processing elements or processors, each processor in the array is responsible for executing a distinct set of instructions on input data before passing it to a next element of the array. To maximize throughput, the problem is divided such that each processor requires approximately the same amount of time to complete its portion of the work. In this way, new input data can be “pipelined” into the array at a rate equivalent to the processing time of each processor, with as many units of input data being processed in parallel as there are processors in the array. Performance can be improved by adding more elements to the array as long as the problem can continue to be divided into smaller units of work. Once this dividing limit has been reached, processing capacity may be further increased by configuring multiple rows in parallel, with new input data allocated to the first processor of a next row of the array in sequence.
In the case of a parallel processor systolic array, data typically flows from one processor to the next in a row over a data plane path of the array. An example of such a systolic array is the processing engine disclosed in U.S. patent application Ser. No. 09/106,478 titled Programmable Arrayed Processing Engine Architecture for a Network Switch, by Darren Kerr et al., which application is hereby incorporated by reference as though fully set forth herein. The processing engine generally comprises an array of processors embedded between an input buffer and an output buffer of an intermediate network station, such as a network switch. The processors are symmetrically arrayed as rows and columns, wherein the processors of each row are configured as stages of a pipeline that sequentially execute operations on data, such as Internet protocol (IP) packets, passed serially among the processors.
Although the data plane path is intended primarily for packet data flow, that path may be used for communication among processors of the engine, particularly between an upstream processor and a downstream processor of a pipeline. Here, packet flow over the data plane path is unidirectional between those processors. By inserting control information in a packet, the upstream processor may communicate with the downstream processor in a manner consistent with packet data flow through the processing engine. However, the data plane path typically contains substantial buffering and, as a result, utilization of that path for processor communication may incur significant latency.
Use of the data plane path for communication among other processors of the engine may be more difficult and time consuming. In particular, for a downstream processor to communicate with an upstream processor, the control packet must flow an additional pass through the processor pipeline. In addition to incurring long latency, this approach may waste substantial processing resources. Moreover, use of a data plane communication path may be impossible for processors in different pipelines.