1. Field of the Invention
The present invention relates to parallel processing method and apparatus for use in processing information at high speeds.
2. Description of the Prior Art
A conventional parallel processor of this type, such as described by Carl Dobbs, Paul Reed, and Tommy Ng in "Supercomputing on Chip," VLSI SYSTEMS DESIGN, Vol. IX, No. 5, May 1988, pp. 24-33, is shown in FIG. 13. The parallel processor consists of an integer unit 501 for performing addition and subtraction of integers and a bit field process; a floating point unit 502 for multiplication of floating point numbers or integers; a floating point unit 503 for performing other floating point arithmetic operations and division of integers; an optional special function unit 504; a data unit 505 for reading and writing data in a memory; a register file 506 used by the above function units to perform an arithmetic operation; a scoreboard 507 for performing detection and avoidance of a contention for a register; a command unit 508 for fetching, decoding, and transferring a command to a function unit; a bus 509 for interconnecting the respective units to the register file; and a program counter 510 for controlling an address of the next command to be executed.
In operation, the command unit 508, which is divided into three pipelined stages; namely, fetch, decode, and transfer stages, completes a fetch in a clock cycle and passes the fetched command to the decode stage. The decode stage decodes part of the command and requests the scoreboard 507 to prefetch the necessary operand from the register file 506 for the function unit corresponding to the command. The scoreboard 507 has scoreboard bits corresponding to respective registers of the register file 506. The scoreboard bit is set while the register is installed or during data operation and is cleared when the data operation is completed. In response to a request, the scoreboard 507 checks the scoreboard bit and, if a scoreboard bit is set, waits until the scoreboard bit is cleared. Then, it issues a use permit to the command unit 508. When the operand necessary for execution of the command is fetched, the command is transferred to the function unit. Each function unit has several stages to execute the command using the prefetched operand.
In the above conventional parallel processor, however, only one command can be decoded in a clock cycle so that no more than one operation result can be obtained, resulting in the limited processing speed.