Due to the physical designs of processor architectures, two or more clock cycles may occur between when the issuing engine issues an operation and when the issuing engine receives feedback regarding whether that issued operation has been executed or retired. Problems can occur if a mechanism is not in place during the interim time period between the occurrence of the issuing of the operation and the feedback to the processor on the execution/retirement of that operation. For example, data corruption can occur if a first operation results in an irreversible data change or state change external to the processor and a second operation executes after the first operation but anticipated using the original data or state.
Also, the continued growth of the microprocessor industry has lead to the development of competing processor architectures. Several prior processor designs try to maintain compatibility between different machines operating according to different instruction set architectures (ISAs). However, a problems in the industry exist in designing a microprocessor architecture to provide architectural compatibility with prior sets of instructions, while introducing a new instruction set architectures such as the reduced instruction set computer (RISC) designs.
One of the difficulties in implementing such a machine is how to superimpose the older, for example, 32-bit instruction semantics on a new, 64-bit architecture having a completely different set of semantics while minimizing the use of special hardware in the execution core of the machine.
A previous processor used an additional piece of hardware called a memory order buffer to handle memory ordering semantics. The processor included an out-of-order engine wherein operations are issued to the execution core of the processor before all of the control dependencies for those operations had been resolved. These operations are known as speculative operations. In the event that a particular operation's control dependencies are resolved to be false, the results of the operation are ignored. However, some operations, such as STORE operations, cannot be performed speculatively as they update the architectural state external to the processor. This processor uses the memory order buffer to resolve this potential data corruption conflict.
For example, a STORE is not issued to the execution engine, but instead is placed into the memory order buffer to hold the STORE addresses and associated data. The STORE is then issued when all the control dependencies have been resolved for that particular operation. To provide correct data for speculative LOADs, the execute engine snoops the speculative store buffer for speculative STOREs to the LOAD address. If a match was found, data was provided from the speculative store buffer. If the Store address is unknown, the LOAD must wait until the STORE address computation result is available.
Thus, the memory order buffer is typically closely coupled with the processor. The memory complex continually receives requests and sends responses to the memory order buffer (MOB). The issue engine (e.g., for issuing instructions) also should couple with the MOB in order to indicate when a STORE is eligible for retirement, and hence, must be considered a committed STORE. The specific problem with this approach is that in an out-of-order machine handling different architectural semantics the issue engine is typically remote from the execute engine; therefore, any access of the machine's architectural state requires many clock cycles. The issue engine is thus unable to rely on architectural state or instruction results when making issuing decisions.
This problem is best illustrated by considering the problem encountered for LOAD operations. First, a determination of whether a LOAD should be blocked due to an unknown STORE address might typically require waiting 7-8 clocks after the address generation micro-operations (uops) have been issued from the issue engine. Again, this delay is due to the physical distance between the scheduling logic and the processor's execution units.
Other prior art processors add a piece of hardware to maintain a list of speculative LOAD addresses and issues STOREs non-speculatively, and in-order. If an address conflict occurs, the LOAD causes a machine flush and re-execution when it comes time for retirement.
Yet another approach is embodied in the HAL, out-of-order implementation of the SPARC™ V9 architecture. This machine sequentializes the address generation component of the memory hierarchy. The address generation component guarantees older STORE addresses are generated before any younger STORE address. Data is then forwarded between the older STOREs and the younger LOADs.