Internet audio and video streaming, as well as image processing and video content creation continuously drive system architects to design even faster microprocessors. In order to improve microprocessor performance, several techniques are used to improve the efficiency of modern day processors. One such technique for providing more efficient microprocessors is “Dynamic Execution”. In summary, Dynamic Execution functions by determining the most efficient manner for executing program instructions, irrespective of the order in which the program instructions are received.
Dynamic Execution uses front-end logic that fetches the next instructions within a program and prepares the instructions for subsequent execution in the machine pipeline. This front-end logic utilizes highly accurate branch prediction logic that uses the past history of program execution to speculate where the program is going to execute next. The predicted instruction address from this front-end branch prediction logic is used to fetch instruction bytes from a level two (L2) cache. Once fetched, these instruction bytes are decoded into basic operations called micro-operations (uOPs) that the execution core can execute.
As such, these micro-operations are provided to out-of-order (OOO) execution logic, along with a sequence number assigned to each uOP. The OOO execution logic has several buffers that it uses to sort and reorder the flow of instructions to optimize performance as instructions go down the pipeline and get scheduled for execution. OOO execution allows program instructions to proceed around the delayed instructions as long as they do not depend on those delayed instructions. As a result, uOPs do not stall when following delayed instructions, in which case, efficiency dictates that the instructions execute in an out-of-order fashion.
The OOO execution logic generally includes retirement logic that reorders the instructions, executed in an out-of-order fashion (dynamic manner), back into the original program order. As a result, OOO execution logic generates a pool of active uOPs that can be executed in a manner which is more efficient than conventional systems. However, in order to implement out-of-order execution, register renaming logic is required to rename logical registers in order to use physical register files. In addition, the renaming logic is required for execution of legacy instructions with improved efficiency.
The described dynamic execution may be implemented within microprocessors that support 128-bit streaming single instruction multiple data (SIMD) extensions (SSE) and streaming SIMD extensions 2 (SSE2) instruction set architectures (ISA). Generally, the 128-bit SSE and SSE2 ISAs (“128-bit ISA”) are implemented by splitting each 128-bit instruction into two uOPs that generate the lower and upper 64-bit chunks of the 128-bit register. These two halves of the architectural register are treated internally as two independent registers.
Unfortunately, for some macro-instructions from the 128-bit ISA, a 64-bit implementation of the 128-bit ISA, redundant uOP is generated during decoding of the macro-instructions. The redundant uOPs use valuable resources such as register allocation slots, as well as execution unit and uOP retirement execution bandwidth. Therefore, there remains a need to overcome one or more of the limitations in the above-described existing art.