1. Field of the Invention
This invention relates to data processing systems and, more specifically, to the parallel execution of instructions out of sequence in a data processing system with multiple execution units.
2. Description of the Related Art
Data processing systems have historically required instructions to be executed sequentially. It is, of course, advantageous to execute instructions in a data processing system as quickly as possible. One method, in the prior art, for speeding the execution of instructions has been to use one execution unit and increase its throughput. A second method has been to use multiple execution units and execute instructions in parallel as much as possible. When executing instructions in parallel in multiple execution units, it is necessary to provide a method to handle data dependencies between instructions.
A method for handling data dependencies in a nonparallel data processing system is disclosed in U.S. Pat. No. 4,630,195, entitled "Data Processing System with CPU Register to Register Data Transfers Overlapped with Data Transfer to and from Main Storage" This method uses tags on registers to determine whether a register to be used by an instruction is the subject of a pending I/O instruction. If the register is free, the instruction can execute without waiting for the I/O instruction to complete. This method, however, does not involve a parallel processing scheme.
One method to handle data dependencies in a parallel data processing system is to express programs according to data flow. An example of this method is disclosed in an article in IEEE Transactions on Computers, Vol. C-26, No. 2, Feb. 2, 1977, pages 138-146, entitled "A Data Flow Multiprocessor". Instructions are separated into modules according to the operands each requires. If an instruction is dependent upon a second instruction, it is laced in the same module as the second instruction Each module is self-contained and produces no side effects because all data dependencies are included. Thus, multiple processors are free to concurrently execute modules. The drawback of the method is that it requires a unique data flow language that is quite different from conventional systems.
A second method for handling instructions in a parallel data processing system is to synchronize the processors. An article in the IBM Technical Disclosure Bulletin, Vol. 32, No 7, December 1989, pp. 109-113, entitled "Device for Synchronizing Multiple Processors" discloses a device for synchronizing multiple processes. The device is capable of barrier synchronization and of serialization of multiple requests from processes. Barrier synchronization is a point in a sequence of instructions that all processes must reach before any process can pass the barrier. A serialization operation is an operation that assigns a unique integer to each of multiple simultaneous requests to indicate priority or to assign each process a unique set of system resources. This method suffers from the fact that it is too rigid and does not allow instructions to execute out of their original sequence.
U.S. Pat. No. 4,763,294, entitled "Method end Apparatus for Floating Point Operations", discloses an apparatus for synchronizing a fixed point processor and a floating point processor depending upon the type of instruction. The floating point instructions are either a member of a first group of instructions requiring interlock or a second group not requiring interlock. In either case, the fixed point unit controls the dispatch of floating point instructions and must wait for the floating point processor to be idle. Thus, the fixed point processor always sees the instructions in their original sequence.
U.S. Pat. No. 4,916,606, entitled "Pipelined Parallel Data Processing Apparatus for Directly Transferring Operand Data between Preceding and Succeeding Instructions", discloses an apparatus that detects when an instruction will use the result of a previous instruction and provides the data directly to the succeeding instruction. This speeds execution by vitiating the step of retrieving data but does not allow parallel processing instructions out of sequence.
U.S. Pat. No. 4,956,800, entitled "Arithmetic Operation Processing Apparatus of the Parallel Processing Type and Compiler which is Used in this Apparatus", discloses an apparatus for performing arithmetic operation processes at a high speed by enabling the execution sequence and the input/output sequence in parallel. This speeds execution, but the instructions are still executed in sequence.
None of the references detailed above describe a mechanism that provides parallel execution of instructions out of sequence in independent execution units where an execution unit is delayed only when the correct execution of an instruction in one execution unit is dependent upon the completed execution of an instruction in a second execution unit.