Internet audio and video streaming, as well as image processing and video content creation are continuously driving system architects to design even faster microprocessors. To improve microprocessor performance, several techniques are utilized to improve the efficiency of modern day processors. One such technique for providing more efficient microprocessors is “Dynamic Execution”. In summary, Dynamic Execution functions by determining the most efficient manner for executing program instructions, irrespective of the order in which the program instructions are received.
Dynamic Execution utilizes front-end logic that fetches the next instructions within a program and prepares the instructions for subsequent execution in the machine pipeline. This front-end logic utilizes highly accurate branch prediction logic that uses the past history of program execution to speculate where the program is going to execute next. The predicted instruction address from this front-end branch prediction logic is used to fetch instruction bytes from a level two (L2) cache. Once fetched, these instruction bytes are decoded into basic operations called uOPs (micro-operations) that the execution core can execute.
As such, these micro-operations are provided to out-of-order (OOO) logic, along with a sequence number assigned to each micro-operation. The OOO logic has several buffers that it uses to sort and reorder the flow of instructions to optimize performance as instructions go down the pipeline and get scheduled for execution. This OOO logic allows program instructions to proceed around the delayed instructions as long as they do not depend on those delayed instructions. As a result, micro-operation do not stall when following delayed instructions, in which case, efficiency dictates that the instructions execute in an out-of-order fashion.
The Dynamic Execution logic generally includes retirement logic that reorders the instructions, executed in an out-of-order fashion (dynamic manner), back into the original program order. As a result, the OOO logic generates a pool of active micro-operations that can be executed in a manner which is more efficient than conventional systems. However, in order to implement out-of-order execution, register allocation and renaming logic is required to allocate physical register to logical destination registers and rename logical source registers into physical registers in order to utilize physical register files. In addition, the allocation and renaming logic is required for execution of legacy instructions with improved efficiency.
Dynamic execution is implemented within microprocessors that support 128-bit streaming single instruction multiple data (SIMD) extensions (SSE) and streaming SIMD extensions 2 (SSE2) instruction set architectures (ISA). Generally, the 128-bit SSE and SSE2 ISAs may be implemented by splitting each 128-bit instruction into two micro-operations (uOPs) that generate the lower and upper 64-bit chunks of the 128-bit register. These two halves of the architectural register are treated internally as two independent registers.
Unfortunately for some instructions from the SSE/SSE2 ISAs, a problem arises when the source and destination are the same and a copy of the original source is required to service the second destination. This problem arises since the two uOP implementation does not preserve the atomicity of the original instruction. As a result, in a 64-bit implementation, various 128-bit instructions are implemented using three uOPs instead of two uOPs to prevent corruption of the data by the out-of-order execution flow. As a result, each additional uOP requires additional resources within the OOO logic as well as the uOP execution units.