1. Field of the Invention
The present invention relates to a parallel processing device for a computer or the like, which processes plural instructions in parallel in order to increase the speed of the information processing operation.
2. Description of the Prior Art
A number of methods have been developed in the past to increase the speed of information processing devices such as computers, and as a result, it has become possible to process an instruction, which might have required several clock cycles to be processed in the past, in approximately one clock cycle. In other words, the CPI (cycles per instruction) value, which might have been two to five in the past, has been approaching one. Meanwhile, a parallel processing device which simultaneously processes plural instructions has been considered so as to more highly increase the speed of the information processing device, that is, to reduce the CPI value to less than unity. The VLIW (Very Large Instruction Word) system ("Configuration Theory of Parallel Computers," Shinji Tomita, Shookoodoo Co., Ltd., Japan, pp. 131-142, November, 1986) is known as an example of this type of parallel processing device. The VLIW parallel computer according to "Configuration Theory of Parallel Computer" is outlined below, referring to FIG. 4.
A basic instruction has a fixed length of 32 bits, and four basic instructions, forming one word, are stored in a space with a length of 128 bits. When the instructions are processed, the full length of one word is simultaneously read and four basic instructions are processed, in parallel and simultaneously through four processing pipelines, whereby the above CPI value becomes 0.25, idealistically. This prior device has four internal buses 201, which have a width of 32 bits. Data unit 202 is connected to internal bus 201 with four 32 bits buses, and this data unit 202 also contains a data cache. Instruction unit 203 also contains an instruction cache. Bus interface 204 is connected to data unit 202 with a 128 bit internal data bus, and also is connected to instruction unit 203 with a 128 bit instruction bus. Further, bus interface 204 is connected to external devices with a 32 bit address bus, a 128 bit data bus, and a 128 bit control bus. Numeral 205 is an instruction decoder, and numeral 206 is an instruction register. Instruction decoder 205 receives a 128 bits long instruction from instruction unit 203, decodes it and stores it in instruction register 206 as a micro instruction. Instruction register 206 retains a micro instruction equivalent to four instructions, and outputs this micro instruction to control No. 1 processing pipeline 208 to No. 4 processing pipeline 211. Numeral 207 is a multiple-port register, and this multiple-port register 207 is connected to internal bus 201 with four 32 bit buses to take in, through internal bus 201, the data to be processed, and outputs the data to respective processing pipelines 208 to 211 through four 32 bit buses. Each of the processing pipelines 208 to 211 spend several clock cycles to perform data processing operation, such as fixed-point arithmetic operation, logic operation, or floating-point arithmetic operation, according to the above mentioned micro instruction. Thus, in practice, the four processing pipelines 208 to 211 combined perform four processing operations every clock cycle. The output side of each of the processing pipelines 208 to 211 is connected to internal bus 201 through a 32 bit bus.
Next, the operation of the VLIW parallel computer having the above configuration is explained. Instruction unit 203 reads in a 128 bit long instruction from the external memory (not illustrated) through bus interface 204. Then, the instruction read that has been read in is decoded by instruction decoder 205 and written in instruction register 206 as a micro instruction. The micro instruction written in instruction register 206 is sent out to each of the appropriate processing pipelines 208 to 211 to control them. Each of the processing pipelines 208 to 211 reads in the data in multiple-port register 207 as is needed and writes the processed data in multiple-port register 207 through internal bus 201, and processing pipelines 208 to 211 again read in these data to perform plural processing operations. Also, processing pipelines 208 to 211 write the processed data in data unit 202 through internal bus 201, and data unit 202 in turn writes the data in multiple-port register 207 through internal bus 201, performing plural processing operations. Then, data unit 202 exchanges data with external devices through bus interface 204. Since the decoding of instructions, reading of micro instructions from instruction register 206, and processing operations in processing pipelines 208 to 211 are all performed by the processing pipelines, it becomes possible to execute four instructions per clock cycle. However, since four basic instructions are processed as one word in the above mentioned VLIW parallel computer, it becomes necessary for the data bus to have a width as high as 128 bits in contrast to the fact that the data bus width of an ordinary computer is 16 bits or 32 bits. Therefore, when this type of parallel processing device is packaged as a whole, there are problems such that the number of pins extending outward increases to make packaging complicated, as well as the fact that the number of peripheral circuits increases.