1. Field of the Invention
The invention relates to microprocessors.
2. Related Art
In the design of processors, and particularly microprocessors, one important goal is speed; it is desirable that the microprocessor perform as many instructions as possible in a unit time. Therefore, it has become known in the art of microprocessor design to provide for performing multiple instructions at once, and to provide for performing instructions out of their original order as specified by the programmer. However, while instructions are sometimes performed xe2x80x9cout of orderxe2x80x9d, it is necessary to cause the result of the out-or-order operation to be the same as if they were performed in the original order.
All microprocessors, including those that execute out of order, include a register file that stores the contents of each register manipulated by the program. In a conventional, in-order implementation, the result of executing an instruction is written to the register file immediately upon execution of the instruction. Performance of an out-of-order implementation, however, could result in an inconsistent register file content at any instant in time. For example, consider an instruction A that is followed by an instruction B in a program. If execution of instruction A causes an exception, then program execution will be automatically re-directed to an exception handler program. At the entry to the exception handler program, it is typically expected that execution has ceased just prior to the execution of instruction A; therefore, the register file is not expected to have been updated by executing instruction A or any following instruction, including instruction B.
In an out-of-order implementation, instruction B may actually be executed before instruction A. However, in order to obey the expected in-order behavior described above, the updating of the register file by instruction B must be postponed. Since the result of executing instruction B cannot be written to the register file immediately, it is written first to a different memory, variously known in the art as a reorder buffer or a result shelf.
Every instruction in an out-of-order implementation goes through a final step of retirement. This step consists of reading the result of the instruction execution out of the result shelf and writing that result into the register file. All instructions must be retired in the order specified by the program. Thus, an instruction B is not retired until instruction A (and all intervening instructions) have been (1) executed, (2) determined not to cause exceptions and (3) retired to the register file.
Out-of-order execution is driven by dependencies between instructions. When an instruction C is first decoded, the instructions on which it depends are identified as the instructions that most recently wrote to all of the operand registers that instruction C reads. Instruction C can be executed when all instructions on which it depends for operand values have been executed. The most recent instruction that wrote to a register that is an operand of C is known as the locker of that operand. When C is ready for excution, each of its operands may be found either in the register file (if the locker instruction has retired) or in the result shelf in the location where the operand locker""s result was first written.
A major challenge in designing an out-of-order microprocessor is determining if an instruction""s operand needs to be read from the result shelf, and if so, from where in the result shelf. A first examination of an instruction C includes determining whether the locker of each operand of C has retired. If the locker has not retired, then some identification of that locker is stored with C until such time as C is executed. This identification is then used to find the operand value in the result shelf.
Some microprocessor architectures and instruction sets aggravate the problem of managing lockers. This is particularly true of systems which provide instructions and parcels that write to only a portion of a register (notably the Intel x86 architecture and instruction set). Thus, while in the usual case, each operand register read by an instruction C was written in its entirety by a single preceding locker instruction, it may be the case that a register contains results written by two or three preceding instructions, each of which wrote to a different portion of that register.
One known solution is to break up each register into multiple logical registers, and to record a separate locker for each portion of the operand register which can be written to with each instruction operand. Thus, a first instruction D which writes to a first portion of some register would set a separate lock from a second instruction E, which writes to a second portion of the register. This informs a subsequent instruction F (F reading the entire register) of the locations in the result shelf for the values for the individual portions While this method achieves the purpose of allowing such instructions to be executed as soon as all their dependencies have been satisfied, and therefore can speed up operation of the microprocessor, it has the drawback that it requires the storage of a much larger number of lockers per operand, with consequent use of more resources (such as circuit area) devoted to such locks.
A second solution for correct execution of example instruction F, is to delay its execution until both instructions D and E have retired. There is no concern for fetching different portions from different result shelf locations because both portions of the register file entry for the operand register have been updated with the result values of D and E. However, this solution results in reduced performance, due to the delay in executing instruction F.
Accordingly, it would be desirable to provide a method and system so that an instruction F can be executed without waiting for instructions D and E to retire, while requiring that only one locker to be stored with each operand. This advantage is achieved in an embodiment of the invention in which such an instruction F is recognized, and an intermediate xe2x80x9cstitchingxe2x80x9d parcel is inserted to couple the results of instructions D and E into a complete register""s worth of data. The intermediate stitching parcel has two operands, each the result of a single preceding instruction, D and E, respectively. The operand of F is now dependent on the result of only one preceding instruction, the xe2x80x9cstitchingxe2x80x9d parcel. The stitching parcel can execute as soon as D and E have executed, and F only needs to wait for the stitching parcel to execute.
The invention provides a method and system for performing instructions in a microprocessor having a set of registers, in which instructions that operate on portions of a register are recognized, and xe2x80x9cstitchingxe2x80x9d instructions are inserted into the instruction stream to couple the instructions operating on the portions of the register. The stitching parcels are serialized along with other instruction parcels, so that instructions which read from or write to portions of a register can proceed independently and out of their original order, while maintaining the results of that out-of-order operation to be the same as if all instructions were performed in the original order. In a preferred embodiment, the choice of stitching parcels is optimized to the Intel x86 architecture and instruction set.