The complexity of computer systems span the range from relatively simple systems having a single central processing unit (“CPU”) to systems having many processors that may operate somewhat independently of each other. One conventional multiple processor computer system is known as a single instruction, multiple data (“SIMD”) processor. In a SIMD processing system, multiple processors or processor elements (“PEs”) simultaneously perform the same operation on different items of data. As a result, SIMD processing systems are particularly useful for processing graphic images since graphic image processing typically involves performing a large number of identical operations on data that may differ from each other.
The PEs in a SIMD processing system are generally coupled to a Central Control Unit that controls the operation of the PEs. The Central Control Unit generally transfers instructions defining the operations that will be performed by the PEs from a single program memory (not shown) into respective register files. The Central Control Unit also loads into the respective register file for each PE the data items that are to be operated on by the PE. Each PE can access its register file to read data, perform the operation on the data, and write results from the operation to the register file. The Central Control Unit can then read the results of all of the operations performed by the PEs by reading from all of the register files. Thus, the register files can be accessed by either the Central Control Unit or its respective PE.
Although separate register files can be provided for each PE, register files for multiple PEs can alternatively be implemented by a memory device, such as a static random access memory (“SRAM”) device or a dynamic random access memory (“DRAM”) device, that is shared by the PEs. In particular, a memory device having an array formed by rows and columns can be organized so that each PE receives data from a respective group of columns in the array. The Central Control Unit can write data to and read data from any location in the memory array, and each PE can write data to and read data from its respective group of columns in the memory array.
A typical SIMD processing system 10 is shown in FIG. 1. The processing system 10, which is being commercially developed under the code name “Yukon,” includes a central control unit 14 coupled to an address bus 16 and a data bus 18. The address bus 16 and data bus 18 are coupled to 32 SRAM devices 201-2032. Each SRAM device 20 includes an array of memory cells (not shown), row and column decoders (not shown) for selecting rows and columns in the array based on respective row and column addresses, a data path (not shown) coupling the data bus 18 to the array, and a variety of other components. In the SRAM devices 20 proposed for use in the Yukon SIMD processing system 10, the array in each SRAM device 20 includes 8 rows of memory cells arranged in 64 columns, and each column can store one byte (8 bits) of data. Thus, each SRAM device 20 includes 4,096 memory cells. In practice, the Central Control Unit 14 includes a PE Control Unit (not shown) providing control and address signals to the SRAM device 20, and a Data Control Unit (not shown) controlling the flow of data to and from the SRAMs 20.
The SIMD processing system 10 also includes 256 PEs designated PE1-PE256, eight of which share a respective SRAM device 20. For example, PE1-PE8 share the SRAM device 201. The PEs are coupled to their respective SRAMs 20 by respective data buses 401-40256 so that each PE can receive data from memory cells in one of a respective group 8 columns. For example, the PE1 can access data stored in columns 0-7 of the SRAM 201, PE8 can access data stored in columns 56-63 of the SRAM 201, and PE256 can access data stored in columns 56-63 of the SRAM 2032. It will therefore be apparent that the SRAMs 20 are dual ported SRAMs since the Central Control Unit 14 can access the SRAMs 20 through data ports that are different from the data ports each of the PEs accesses the SRAMs 20.
The Central Control Unit 14 also includes a number of control signal lines that are coupled to the SRAMs 20 and the PEs, but these lines have been omitted from FIG. 1 in the interest of brevity and so as not to unduly obscure certain details about the SIMD computer system 10 shown in FIG. 1. In the Yukon system, these control lines control the operation of all of the PEs so all of the PEs perform the same function. However, in other SIMD computer systems, the PEs may access the same or different instructions from a program memory (not shown).
In operation, the Central Control Unit 14 writes data to specific locations in each of the SRAMs 20. Since the computer system 10 is a SIMD system, the PEs generally perform the same function, although the data stored in the SRAMs 20 for each PE often varies. The Central Control Unit 14 applies row addresses to the SRAM's to make available to the PEs the data that are operated on by the PEs. Each PE then produces a respective result, and this result is made available to an SRAM 20. The Central Control Unit 14 addresses the SRAMs 20 to write the result data from each of the PEs to memory cells in at least one of the respective groups of columns that are coupled to the PE. Finally, the Central Control Unit 14 reads the results data from the SRAMs 20. Thus, the SRAM provide both scratch pad storage for the PEs and a communications path between the PEs and the Central Control Unit 14.
Although the SIMD system 10 shown in FIG. 1 can significantly increase the speed at which certain repetitious operations can be performed, it requires almost complete parallelism of the operations performed by the PEs. In particular, although different data can be stored in the SRAMs 20, all of the PEs must receive data from the same locations in the SRAMs 20. Similarly, results data from all of the PEs must be stored in the same row location and same column location within a respective group of columns. A PE cannot receive data from or transfer data to different locations in the SRAMs depending upon the results of an operation performed by the PE. These limitations on the PEs' ability to access different locations in the SRAM 20 depending upon the results of an operation can seriously limit the usefulness and versatility of SIMD computer systems like the system 10 shown in FIG. 1.
There is therefore a need for a SIMD computer system that allows individual PEs to access data and instructions from different locations in a register file or memory device depending upon the results of operations performed by the PEs.