Providing ever faster microprocessors is one of the major goals of current processor design. Many different techniques have been employed to improve processor performance. One technique which greatly improves processor performance is the use of cache memory. As used herein, cache memory refers to a set of memory locations which are formed on the microprocessor itself, and consequently, has a much faster access time than other types of memory, such as RAM or magnetic disk, which are located separately from the microprocessor chip. By storing a copy of frequently used data in the cache, the processor is able to access the cache when it needs this data, rather than having to go "off chip" to obtain the information, greatly enhancing the processor's performance.
Superscalar processors achieve still further performance advantages over conventional scalar processors because they allow instructions to execute out of program order. In this way, one slow executing instruction will not hold up subsequent instructions which could execute using other resources on the processor while the stalled instruction is pending.
However, certain problems arise when superscalar processors attempt to take full advantage of cache memory. One problem arises in the processing of certain complex types of load and store instructions. Numerous techniques exist for processing simple load or store instructions which have a one-to-one correspondence between the instruction and the transfer from the data cache to the architected registers, or vice versa. One example of a simple cache accessing instruction is a load instruction which loads data from a location in cache memory into a single rename register, where the width of the register in the rename file is the same as the data transferred from the cache. Thus, there is one load instruction for each load into the register file. However, it becomes much more difficult to implement a data processor which allows a complex load instruction, such as a load multiple or a load string, to load data from the data cache into a series of rename or architectural registers. This is because the one-to-one correspondence between the instruction and the physical transfer of data no longer exists, and it is much more difficult to track the instruction's progress through the processor and ensure that it is satisfactorily completed.
One technique for processing complex load or store instructions in a superscalar processor is to halt further dispatch of instructions as soon as the dispatch of a complex load or store instruction is detected by the dispatch unit. This is described in U.S. Pat. No. 5,664,215 to Burgess, incorporated herein by reference. In this technique, all pending instructions in the processor are then allowed to complete. Afterwards, the complex load or store instruction is then dispatched and executed by the processor in scalar fashion. When the complex instruction is complete, then the processor resumes dispatch of instructions. Although this ensures that the complex load or store instruction will be accurately completed, it does so at the expense of processor performance. Accordingly, it is one object of the invention to provide a data processor which provides high performance speculative processing of complex load and store operations. Still further objects and advantages of the present invention will become apparent in view of the following disclosure.