Parallel processing generally refers to the concept of increasing the speed of execution of a program by segregating the program into multiple segments, which can be executed simultaneously across multiple processors. Depending on the type of application, different parallel processor architectures provide varying results and require different program segmentation. For example, a program may be dissected into components that, whilst being executed concurrently, are essentially executed independently of one another. This type of parallel processing is known as Multiple Input stream, Multiple Output stream (MIMD). Such an approach provides flexibility, but at the expense of increased complications due to race conditions, in which timing discrepancies and data dependency between processing units of the MIMD processor may cause the executing program components to lose their correct sequence, resulting in interruption of the execution. An alternative genre of parallel processors is known as a Single Instruction stream, Multiple Data stream (SIMD). This type of parallel processing unit is particularly useful when performing the same execution across a large amount of data (for example, image processing), where an operation may be uniformly applied across all, or a substantial segment, of the pixels of the image.
FIG. 1 illustrates a SIMD unit 100 according to an existing configuration. The SIMD unit 100 comprises several processing elements (PEs) 110a-110n, which operate in parallel. Also shown are respective memory banks 120a-120n, which are memory stacks in the illustrated example. Each memory bank comprises a number of memory addresses, address 0, address 1, . . . , address M. The PEs 110 may be any execution engine, such as an arithmetic processor, that performs commands such as addition, subtraction, multiplication, and division, for example. Alternatively, the PEs 110 may equally be logical and bit manipulation units, which perform operations such as ADD, OR, EXCLUSIVE-OR, etc. Each processing element 110 can receive multiple data inputs from and write data to a respective memory bank 120 via respective read and write operations.
FIG. 2 shows an example of a SIMD instruction 200 that can be executed by the SIMD unit 100 of FIG. 1. The instruction 200 comprises several components 210-250. The composition and sequencing of the components 210-250 varies depending on the implementation of the SIMD unit 100. In the example, the instruction 200 comprises a command instruction (CMD) component 210, data source components 220, 230 (SRC0 and SRC1, respectively), a destination address component 240 (DST) and a miscellaneous control component 250 (MISC).
The CMD component 210 indicates the type of command to be executed, and the SCR0 component 220 and SRC1 component 230 provide the source addresses of the data in the respective memory banks, on which the CMD command is to be executed. The DST component 240 gives the destination address of data in the respective memory banks, where the data result of performing the CMD command on the data sources SCR0 and SCR1, is to be written to. The MISC component 250 provides further instruction variances for the PEs to perform, such as whether the executed CMD command result is taken absolute or shifted before the write process or whether the source data is the source component (SRC0 or SCR1) itself instead of from a memory bank.
FIG. 3 illustrates a method 300 for executing an instruction such as that shown in FIG. 2, on a SIMD unit such as the arrangement 100 of FIG. 1. In step 310, a SIMD instruction (200) is received by the SIMD unit. The instruction may be parsed to separate the instruction components. Alternatively, the instruction may be parsed before being provided to the SIMD unit. In step 320, the PEs receive the instruction command. In step 330, the PEs retrieve data from their respective memory banks. The data in the memory bank addresses indicated in the SRC0 and SRC1 components of the instruction are retrieved from each of the respective memory banks. The PEs read the respective source data.
In step 340, the PEs execute the instruction. That is, the command instruction is executed by each of the PEs. The MISC control information is passed to PEs. The different executions of instruction variances are controlled by different MISC control information settings. All the PEs must execute the same operation in any execution cycle. In step 350, the result is written to the destination address. Each of the PEs writes to the memory address indicated by the DST component of the instruction (200 of FIG. 2), in each of the PEs' respective memory banks. Disadvantageously, the above SIMD unit configuration of FIGS. 1 to 3 is restricted to reading and writing to a memory bank associated with a respective PE only. Further, each PE must execute exactly the same instruction variance on the retrieved data.
A need therefore exists for a SIMD unit that provides greater flexibility over the above arrangements.