Contemporary computing systems seek to take advantage of superscalar architectures to improve processing performance. Superscalar architectures are characterized by multiple and concurrently operable execution units integrated through a plurality of registers and control mechanisms. This allows the architecture to execute multiple instructions in an out-of-order sequence, thus utilizing parallelism to increase the throughput of the system.
Although superscalar architectures provide benefits in improving processor performance, there are numerous difficulties involved in developing practical systems. An overview of some of the difficulties encountered, as well as various strategies for addressing them, are described in, for example, Johnson, et al., Superscalar Microprocessor Design, Prentice Hall (1991).
One problem in particular is that the control mechanism must manage dependencies among the data being concurrently processed by the multiple execution units. These dependencies arise in various ways. For example, if a load instruction is dependent on a previously issued store instruction, and the load completes before the store, then the data loaded into the architectural registers of the processor by the load instruction would be invalid unless the load-hit-store occurrence is detected and corrected by flushing the load instruction and subsequent instructions, then re-executing the instructions. Load and store instructions are sometimes referred to, generally, as storage reference instructions. If a store instruction logically (i.e., in program order) follows a load, the dependencies only relate to the registers of the store instruction. If a load logically precedes a store, it must load its data before the store occurs. For both "load-hit-store" and "store-hit-load" described in the disclosure, the load instruction follows a store instruction in the program order. For a load-hit-store, the store is executed before the load, however, the store does not complete, or write its data to the cache before the load executes. In this case, the store executes prior to the load. For a store-hit-load, the load executes before the store. When it is detected that the store writes to the same location as the load, the load must be re-executed along with any instructions dependent on the load. In this case, the load executes prior to the store.
Therefore, when performing out-of-order loads and stores, it is necessary to determine if the addresses and byte lengths of the loads and stores result in an overlap of at least one byte. Moreover, it is desirable that the determination of any address overlap is made as early as possible in the instruction execute stage in order to maximize the processing speed of the processor. Further, it is desirable that the determination be made with a minimal amount of hardware in order to conserve resources on the processor and reduce design complexity.
Accordingly, it is an object of the present invention to provide techniques for addressing the above mentioned difficulties. Still further objects and advantages of the invention will be apparent to those of skill in the art in view of the following disclosure.