The present invention relates in general to data processing systems, and in particular, to the execution of store instructions in a processor.
In order to increase the operating speed of microprocessors, architectures have been designed and implemented that allow for the out-of-order execution of instructions within the microprocessor. However, traditionally, load and store instructions have not been executed out of order because of the very nature of their purpose. For example, if a store instruction is scheduled to be executed in program order prior to a load instruction, but the processor executes these two instructions out of order so that the load instruction is executed prior to the store instruction, and these two instructions are referring to the same memory space, there is a likelihood that the load instruction will load incorrect, or old, data since the store instruction was not permitted to complete prior to the load instruction.
Furthermore, even if such store and load instructions are permitted to execute out of order, a store operation may still be stalled waiting for necessary data to become available. Therefore, there is a need in the art to improve the performance of executing store instructions in a processor.
The present invention addresses the foregoing need by dividing the execution of store instructions into two separate execution units. If the store instruction is a floating point store instruction, then the floating point store instruction is sent to the load store unit for generation of the address portion of the store instruction and the floating point execution unit for execution of the store data portion of the store instruction. If the store instruction is a fixed point store instruction, then the store instruction is divided (cracked) into an address generation internal op code and a store data internal op code. The store data internal op code is executed within the fixed point execution unit, while the address generation internal op code is executed within the load store unit. As a result, execution of a store instruction is divided into parallel tasks, which can be executed concurrently and independent of each other. Upon completion of all older instructions, the divided or cracked store instruction is then completed.
The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention.