FIG. 1 of the accompanying drawings illustrates a data processing device 1 having a processing element 10. Each processing element 10 receives data 11 to be processed in accordance with a received instruction 12. The processing element 10 receives a clock signal input 13 for synchronising operation and execution of the received instructions. Following execution of the instruction 12 on the data 11 by the processing element 10, a result 14 is output. The processing element 10 can be arranged to provide any appropriate function or functions.
FIG. 2 of the accompanying drawings illustrates an exemplary processing element 10 which includes a plurality of function units 16 (16A to 16F) which have respective individual functions. For example, a function unit may provide a memory read function, a memory write function, an add function, a divide function, or a multiply function. The plurality of function units 16 can be arranged to provide a desired range of functions. It will be readily appreciated that each function unit may have any appropriate function, and that any appropriate combination of functions may be provided.
A data input 17 delivers data to be processed to the processing element 10, and a multiplexer 18 routes the data to the correct function unit dependent upon the contents of the data being received. An enable signal and a clock signal (not shown in FIG. 2 for the sake of clarity) are provided to the function units. When the enable signal is provided to a function unit, then the function unit executes its function on received data on the next clock cycle or cycles. The number of cycles taken for execution of a particular function is dependent upon that function as is well known.
Following execution of the function, a function unit 16 provides processed data as an output 20 (20A to 20F). These outputs 20 (20A to 20F) are provided as inputs to a multiplexer 21 which operates to select one of the outputs 20 for output from the processing element 10 as an output 22.
In a previously considered processing element, instructions are executed serially in order of receipt, so that only one function unit in the plurality of function units is operating at any one time. This order of execution is determined by the program being executed on the processing device 1. In such an arrangement, only one output 20 is active at any one time, and the multiplexer 21 selects that output 20 as the output from the processing element 10.
In order to provide enhanced processing capabilities, and in order to reduce the need for external memory write and read operations (which add to delay and latency of processing), and to increase the number of instructions executing in parallel in one cycle, a processed data feedback architecture has been proposed for the processing element. FIG. 3 of the accompanying drawings illustrates schematically such an architecture. The processing element 10′ of FIG. 3 includes an input multiplexer 24 for supplying data to be processed to the functions units 16. In contrast with the FIG. 2 example, the input multiplexer 24 of the processing element 10′ is connected to receive the outputs 22A to 22F of the functions units 16A to 16F. In this manner, the input multiplexer is able to feedback the result of one function unit to one of the function units for further processing in dependence upon the program being executed. In this manner, a series of instructions can be executed without the need for external memory input/output processes and increases the number of instructions executing in parallel in one cycle. Such a technique enables a series of instructions be processed more quickly and with lower delay.
However, when a program contains multiple sequences of instructions, execution of the instruction in a single series can lead to unnecessarily extended delays. In order to overcome this issue, in a paper entitled “Cheap Out-of-Order Execution using Delayed Issue” (0-7695-0801-4/00), J. P. Grossman of the Dept of EECS, MIT, presents a technique in which instruction sequences that are independent of one another are interleaved. In such a technique, instructions are executed such that multiple function units operate in parallel, with the requirement that instructions in a given sequence are executed in the correct order. Grossman achieves this by proposing to delay issuance of instructions to function units and controlling the order in which these instructions are executed. Grossman also discusses applying such a technique to looped instruction sequences. In such a manner it is possible to reduce the overall execution time of the independent instruction sequences.
However, such a technique can still result in unnecessary delays in processing sequences of instructions, particularly if those sequences include looped instructions. The problem is particularly acute in data processing applications where low latency is desirable, if not essential. One example of such an application is in the wireless telecommunications field in which streams of data packets must be processed with low latency whilst maintaining data packet order and low rates of packet dropping.