1. Field of the Invention
The invention relates generally to data processing systems and, more particularly, to systems that process instructions out of program order.
2. Background Information
Out-of-order processing systems execute program instructions in an order that allows for more efficient use of processor time. The system must, however, manage the processing to ensure that the program produces the same result that it would if the instructions were executed in program order. One such system is discussed in U.S. Pat. No. 6,098,166 which is incorporated herein by reference.
There are several steps involved in processing an instruction. Generally, to speed operations, the system is set up as a pipeline so that the processing of a given instruction may be started while other instructions are still being processed through the pipeline. One type of instruction of interest is a LOAD instruction that instructs a processor to retrieve data from memory and load the data into a designated general purpose register, or GPR. Operand instructions then make use of, that is, process, the data that is held in the register, and provide the results to the same or another GPR.
In high-speed multi-processor systems, each processor is typically associated with a cache memory. During a LOAD operation the processor looks to the cache memory for the data and, only as necessary, looks for the data in other layers of memory that take longer to access. The processor first retrieves data from locations in the cache memory that correspond to virtual address information in the LOAD instruction, and writes the retrieved data to the designated GPR. The processor also enters an associated address table and a translation buffer to retrieve a corresponding cache address tag and address translation information, respectively. The processor then translates the address tag in a known manner, to determine if the retrieved data is the requested data. If the processor verifies that the retrieved data is the requested data, i.e., confirms a xe2x80x9ccache hit,xe2x80x9d the LOAD operation is essentially complete. Otherwise, the processor confirms a xe2x80x9ccache miss,xe2x80x9d and the processor must then retrieve the requested data from one of the other layers of memory.
With a cache hit the data is available as soon as the data is retrieved from the cache and loaded into the designated GPR. To take full advantage of the reduced latency associated with the cache memory, the system should start processing instructions that make use of the data as soon as the data is loaded into the GPR. The risk, however, is that a cache miss has actually occurred and the data is invalid. Accordingly, the system must keep track of which instructions used the data before the processor has verified the data. As necessary, the system reissues the instructions if a cache miss is confirmed. A system for keeping track of the instructions that issued before the data is verified is discussed in the above-referenced patent. The number of instructions that must be reissued adds both to the delay and complexity associated with the reissue operations. In the patent, the period between the time the data is written to the designated GPR and the time the data is verified is referred to as a xe2x80x9cspeculative time window.xe2x80x9d
The system can be improved by including therein a mechanism that predicts the likelihood of a cache hit based on the immediate processing history. An analysis of such systems has shown, for example, that cache misses tend to occur in groups. The occurrence of one or more cache misses thus warrants the prediction that the next LOAD operation will also involve a cache miss. Accordingly, based on the prediction of a cache miss, the system waits for data verification after the next LOAD operation before issuing a next instruction that uses the data. In this way the system avoids having to reissue the instructions that are dependant on the LOAD operation. If the LOAD operation turns out to include a cache hit, however, the system has unnecessarily delayed processing the dependent instructions. Further, other processing operations may be delayed if some or all of the instructions queued for issue require the results of the dependent instructions.
The system changes its prediction of a cache miss after one or more cache hits occur. If the prediction is inaccurate, the system re-issues the dependent instructions, changes its prediction, and so forth.
While the prediction mechanism provides an improvement to system operations it does not, for example, aid the system in selectively issuing instructions to make more efficient overall use of the processors and/or in reducing the number of instructions that are involved in a given reissue operation.
The inventive system includes in an instruction a xe2x80x9cdata verified,xe2x80x9d or DV, bit that indicates if this instruction or a dependent instruction may be associated with the retrieved data as soon as the data is available or should instead be associated with the data after verification. If the DV bit is in a first state, e.g., not set, the system may issue instructions that use associated data as soon as the data is available. If the DV bit is in a second state, e.g., set, the system does not issue the instructions that use the data until the data is verified.
The system sets the DV bit based on an analysis of the instruction set and/or accumulated profile data from previous use or uses of the software. To analyze the instruction set, or a relevant portion thereof, the system determines which instructions depend on the data provided by a given LOAD instruction and/or where in the program the dependent instructions occur with respect to the LOAD instruction. In one embodiment the DV bit is set if the data are first required by an instruction that is part of a relatively long string of dependent instructions. This avoids potentially having to reissue all or part of the long string of instructions if a cache miss occurs.
The analysis of the instruction set may reveal that the first instruction that uses the data is widely separated from the LOAD instruction in the instruction set, and thus, not likely to be issued before the data is verified. The DV bit is then not set, even if the instruction is part of the long string of instructions. Similarly, the DV bit is not set if relatively few instructions use the data, because a reissue of the few instructions should not adversely affect processor operations if the data are later determined to be invalid.
As discussed in more detail below, the system may use the DV bits to aid in compiling the program, such that the dependent instruction and the LOAD instruction associated with a set DV bit end up relatively widely separated in the instruction set. The system can then issue the intermediate, independent instructions i.e., those that do not depend on the data, while the data verification function is performed.
The system may set the DV bit in instructions associated with cache misses that are predicted based on profile data from past uses of the software and/or set the DV bit based on a combination of the profile data and the separation of the LOAD instruction and the various dependent instructions, and so forth.
The system may include the DV bit in the LOAD instruction or in the operands that use the data. As discussed below, the system or a user sets the bit appropriately to indicate whether a given operand can or cannot be processed before the associated data is verified. For ease of understanding, the operands that use the data provided by a LOAD instruction are hereinafter referred to as xe2x80x9cuser instructions.xe2x80x9d