1. Field of the Invention
The present invention relates to parallel processors for use in processing information at high speeds.
2. Description of the Prior Art
A conventional parallel processor of this type--which is described by Carl Dobbs, Paul Reed, and Tommy Ng in "Supercomputing on Chip," VLSI SYSTEM DESIGN, Vol. IX, No. 5, May 1988, pp. 24-33--is shown in FIG. 5. The parallel processor consists of an integer unit 501 for performing addition and subtraction of integers and a bit field process; a floating point unit 502 for multiplication of floating point numbers or integers; a floating point unit 503 for performing other floating point arithmetic operations and division of integers; an optional special function unit 504; a data unit 505 for reading and writing data in a memory; a register file 506 used by the function unit to perform an arithmetic operation; a score board 507 for performing detection and avoidance of a competition for a register; a command unit 508 for fetching, decoding, and transferring a command to a function unit; a bus 509 for interconnecting the respective units to the register file; and a program counter 510 for controlling an address of the next command to be executed.
In operation, the command unit 508--which is divided into three pipelined stages; namely, fetch, decode, and transfer stages--completes a fetch in a clock cycle and passes the fetched command to the decode stage. The decode stage decodes part of the command and requests the score board 507 to prefetch the necessary operand from the register file 506 for the function unit corresponding to the command. The score board 507 has score board bits each corresponding to respective registers of the register file 506. The score board bit is set while the register is installed or during data operation and is cleared when the data operation is completed. In response to the request, the score board 507 checks the score board bits and, if the score board bit is set, waits until the score board bit is cleared. Then, it informs the command unit 508 of a use permit. When the operand necessary for execution of the command is fetched, the command is transferred to the function unit. Each function unit has several stages to execute the command using the prefetched operand.
In the above conventional parallel processor, however, only one command can be decoded in a clock cycle so that no more than one operation result can be obtained, resulting in the limited processing speed.