Many microprocessors execute instructions in pipeline stages. Typically, to issue an instruction for execution, operands for the instruction are retrieved from general purpose registers located in a register file and forwarded to an execution unit. Results from the execution unit are stored back in the register file to be used by subsequent instructions. Generally, instructions are issued for execution “in-order,” i.e., in the order the instructions are fetched. If the instruction is simple, an execution unit may use one cycle to execute an instruction. As such, simple instructions can be issued and executed in the same order.
However, complex instructions may need different numbers of cycles to complete. For example, a multiply and accumulate (MAC) instruction may require three cycles to complete, whereas a simple instruction, such as an arithmetic logic unit (ALU) instruction, may only require one cycle to complete. Therefore, if a MAC instruction requiring three cycles is issued followed by an ALU instruction requiring one cycle, the ALU instruction will complete execution prior to the MAC instruction. In this manner, instructions are executed and completed “out-of-order.”
Executing instructions out-of-order cause microprocessors to deal with a number of complexities that affect processing performance. For instance, if an operand needed for a current instruction depends on a result from a previously executed instruction that has not been stored back in the register file, a data dependency hazard condition exists. Under these circumstances, if the register file is accessed without the desired result being previously stored in the register file, the current instruction will use an incorrect operand and the instruction will be improperly executed. Different schemes have been implemented to handle data dependency hazards when executing instructions “out-of order.”
One scheme uses a reorder buffer (“ROB”) that allows instructions to be executed out-of-order. A ROB contains a plurality of locations containing entries allocated for each issued instruction. Each entry contains a field to hold the result from an executed instruction prior to being retired to the register file. A result is retired if it is a valid result for an executed instruction and it is in the process of being stored or is stored in the register file. The location of its entry is then allocated for a new result for subsequent executed instructions. Entries in the ROB maintain the order of instructions being issued, however, results from executed instructions can be stored in corresponding entries of the ROB upon completion. In this manner, using a ROB allows results to be stored as soon as they are generated, which allows for out-of-order execution of instructions. Consequently, as the results for instructions are retired, new entries are allocated in the ROB for additional issued instructions.
To handle a data dependency hazard, a microprocessor with a ROB can read an operand from either the register file directly if the operand is stored in the register file or from the ROB if the operand was generated but not retired to the register file, thereby “bypassing” the register file. For example, if a second ALU instruction requires the result of a previously executed first ALU instruction and that result is in the ROB, the microprocessor can obtain the result for the second ALU instruction directly from the ROB (assuming the result of the first ALU instruction is stored in the ROB) instead of waiting for it to be stored in the register file first. In other words, the microprocessor can bypass the register file to obtain the result for the second ALU instruction if stored in the ROB. As a result, waiting for data to be available in the register file can be avoided. Consequently, in this process, a check is required to determine if each entry in the ROB contains a result that can be used as an operand for the current instruction.
Prior microprocessor systems that use a ROB to perform out-of-order execution require a large ROB, e.g., having 32 to 64 entries. There are disadvantages to using such a large ROB for a microprocessor. In particular, if a ROB contains a large number of entries, processing overhead is extensive for instructions that are dependent on a result from a previous instruction. This is because the microprocessor must provide a data bypass for each entry in the large ROB. Furthermore, access to a large ROB for reading and writing data requires an extensive amount of power. For many electronic devices that can process complex instructions, e.g., cellular phones or hand-held computing devices, minimizing power use is an important factor for extending battery life. In addition, executing instructions out-of-order becomes even more complex if more than one instruction is issued and executed at a time.
There exists, therefore, a need for a simpler and more efficient data processing system that executes instructions “out-of-order,” without using a large ROB.