This invention relates in general to the field of prefetch instructions in microprocessors, and more particularly to a microprocessor that selectively performs prefetch instructions depending upon the current level of processor bus activity.
Most modern computer systems include a microprocessor that performs the computation necessary to execute software programs. The computer system also includes other devices connected to the microprocessor such as memory. The memory stores the software program instructions to be executed by the microprocessor. The memory also stores data that the program instructions manipulate to achieve the desired function of the program.
The devices in the computer system that are external to the microprocessor, such as the memory, are directly or indirectly connected to the microprocessor by a processor bus. The processor bus is a collection of signals that enable the microprocessor to transfer data in relatively large chunks, such as 64 or 128 bits, at a time. When the microprocessor executes program instructions that perform computations on the data stored in the memory, the microprocessor must fetch the data from memory into the microprocessor using the processor bus. Similarly, the microprocessor writes results of the computations back to the memory using the processor bus.
The time required to fetch data from memory or to write data to memory is typically between ten and one hundred times greater than the time required by the microprocessor to perform the computation on the data. Consequently, the microprocessor must inefficiently wait idle for the data to be fetched from memory.
To minimize this problem, modern microprocessors include a cache memory. The cache memory, or cache, is a memory internal to the microprocessorxe2x80x94typically much smaller than the system memoryxe2x80x94that stores a subset of the data in the system memory. When the microprocessor executes an instruction that references data, the microprocessor first checks to see if the data is present in the cache and is valid. If so, the instruction can be executed immediately since the data is already present in the cache. That is, the microprocessor does not have to wait while the data is fetched from the memory into the cache using the processor bus. The condition where the microprocessor detects that the data is present in the cache and valid is commonly referred to as a cache hit.
Many cache hits occur due to the fact that commonly software programs operate on a relatively small set of data for a period of time, operate on another relatively small data set for another period of time, and so forth. This phenomenon is commonly referred to as the locality of reference principle. If the program exhibits behavior that substantially conforms to the principle of locality of reference and the cache size is larger than the data set size during a given period of time, the likelihood of cache hits is high during that period.
However, some software programs do not exhibit behavior that substantially conforms to the principle of locality of reference and/or the data sets they operate upon are larger than the cache size. These programs may require manipulation of a large, linear data set present in a memory external to the microprocessor, such as a video frame buffer or system memory. Examples of such programs are multimedia-related audio or video programs that process video data or audio wave file data. Typically, the cache hit rate is low for such programs.
To address this problem, some modern microprocessors include a prefetch instruction in their instruction sets. The prefetch instruction instructs the microprocessor to fetch a cache line specified by the prefetch instruction into the cache. A cache line is the smallest unit of data than can be transferred between the cache and other memories in the system, and a common cache line size is 32 or 64 bytes. The software programmer places prefetch instructions at strategic locations in the program to prefetch the needed data into the cache. Consequently, the probability is increased that the data is already in the cache when the microprocessor is ready to execute the instructions that perform computations with the data.
In some microprocessors, the cache is actually made up of multiple caches. The multiple caches are arranged in a hierarchy of multiple levels. For example, a microprocessor may have two caches, referred to as a first-level (L1) cache and a second-level (L2) cache. The L1 cache is closer to the computation elements of the microprocessor than the L2 cache. That is, the L1 cache is capable of providing data to the computation elements faster than the L2 cache. The L2 cache is commonly larger than the L1 cache, although not necessarily.
One effect of a multi-level cache arrangement upon a prefetch instruction is that the cache line specified by the prefetch instruction may hit in the L2 cache but not in the L1 cache. In this situation, the microprocessor can transfer the cache line from the L2 cache to the L1 cache instead of fetching the line from memory using the processor bus since the transfer from the L2 to the L1 is much faster than fetching the cache line over the processor bus. That is, the L1 cache allocates a cache line, i.e., a storage location for a cache line, and the L2 cache provides the cache line to the L1 cache for storage therein. The pseudo-code below illustrates a conventional method for executing a prefetch instruction in a microprocessor with a two-level internal cache hierarchy. In the code, a no-op denotes xe2x80x9cno operationxe2x80x9d and means that the microprocessor takes no action on the prefetch instruction and simply retires the instruction without fetching the specified cache line.
if (line hits in L1)
no-op; /* do nothing */
else if (line hits in L2)
supply requested line from L2 to L1;
else
fetch line from processor bus to L1;
Microprocessors include a bus interface unit (BIU) that interfaces the processor bus with the rest of the microprocessor. When functional blocks within the microprocessor want to perform a transaction on the processor bus, they issue a request to the BIU to perform the bus transaction. For example, a functional block within the microprocessor may issue a request to the BIU to perform a transaction on the processor bus to fetch a cache line from memory. It is common for multiple bus transaction requests to be pending, or queued up, in the BIU. This is particularly true in modern microprocessors because they execute multiple instructions in parallel through different stages of a pipeline, in a manner similar to an automobile assembly line.
A consequence of the fact that multiple requests may be queued up in the BIU is that a request in the queue must wait for all the other requests in front of it to complete before the BIU can perform that request. Consequently, if a bus transaction request is submitted to the BIU for a prefetch of a cache line, the possibility exists that the prefetch request may cause a subsequent request associated with a more important non-prefetch instruction to wait longer to be performed on the bus than it would otherwise have had to, thereby possibly degrading overall performance.
Commonly, a prefetch instruction is by definition a hint to prefetch the cache line rather than an absolute command to do so. That is, the microprocessor may choose to no-op the prefetch instruction in certain circumstances. However, conventional microprocessors do not consider the likelihood that performing a prefetch that generates additional processor bus activity will degrade performance. Therefore, what is needed is a microprocessor that selectively performs prefetch instructions based on this consideration.
The present invention provides a microprocessor and method that compares a current level of bus activity with a predetermined threshold value as a prediction of future bus activity and selectively performs prefetch instructions based on the prediction. Accordingly, in attainment of the aforementioned object, it is a feature of the present invention to provide a microprocessor for selectively performing a prefetch instruction. The microprocessor includes a bus interface unit (BIU), which performs bus transactions on a bus coupling the microprocessor to a memory. The microprocessor also includes a predictor, coupled to the BIU, that generates a prediction of whether prefetching a cache line specified by the prefetch instruction will delay subsequent bus transactions on the bus. The microprocessor also includes control logic, coupled to the predictor, which selectively does not prefetch the cache line if the prediction indicates prefetching the cache line will delay the subsequent bus transactions.
In another aspect, it is a feature of the present invention to provide a microprocessor for selectively performing a prefetch instruction. The microprocessor includes a bus interface unit (BIU), which indicates a current level of bus requests for the BIU to perform on a bus coupling the microprocessor to a memory. The microprocessor also includes a register, coupled to the BIU, which stores a bus request threshold. The microprocessor also includes a comparator, coupled to the register, which generates a prediction of whether the BIU will perform a substantially high level of bus requests on the bus shortly after the prefetch instruction based on a comparison of the bus request threshold and the current level of bus requests. The microprocessor also includes control logic, coupled to the comparator, which prefetches a cache line specified by the prefetch instruction according to a first method if the prediction indicates the BIU will perform a substantially high level of bus requests on the bus in close temporal proximity to the prefetch instruction, and which prefetches the cache line according to a second method otherwise.
In another aspect, it is a feature of the present invention to provide a microprocessor for selectively performing a prefetch instruction specifying a cache line, the microprocessor having a first-level cache and a second-level cache, and a bus interface unit (BIU) for interfacing the caches to a bus coupling the microprocessor to a memory. The microprocessor includes a threshold register, which stores a threshold, and a comparator, coupled to the threshold register, which generates a true value on an output if a number of requests outstanding in the BIU to be performed on the bus is greater than the threshold. If the output is true and the cache line is present in the second-level cache, then the microprocessor transfers the cache line from the second-level cache to the first-level cache only if the cache line in the second-level cache has a status other than shared.
In another aspect, it is a feature of the present invention to provide a microprocessor having first and second cache memories. The microprocessor includes a threshold register, which stores a bus transaction queue depth threshold, and a comparator, coupled to the threshold register, which generates a result. The result is true if the microprocessor has more transactions to perform on a bus coupled to the microprocessor than the bus transaction queue depth threshold. The microprocessor also includes an instruction decoder, which decodes a prefetch instruction specifying a cache line. The microprocessor also includes control logic, coupled to receive the result. If the cache line misses in the first and second cache memories and the result is true, then the control logic forgoes requesting a transaction on the bus to fetch the cache line.
In another aspect, it is a feature of the present invention to provide a method for a processor having level one (L1) and level two (L2) caches to selectively prefetch a cache line specified by a prefetch instruction. The method includes determining whether the cache line hits in the L1 and L2 caches, determining a status of the cache line if the cache line hits in the L2 cache, and determining whether more transactions than a predetermined threshold value are queued by the processor to be transacted on a bus coupled thereto. The method also includes fetching the cache line from system memory if the cache line misses in the L1 and L2 caches and if not more than the threshold value transactions are queued.
An advantage of the present invention is that it potentially makes more efficient use of the processor bus and cache by not allocating prefetch-specified lines to the detriment of subsequent more urgent allocations. The addition of a programmable threshold register used to accomplish the selective prefetching is nominal in terms of both chip real estate and timing, particularly relative to the benefits accrued.
Other features and advantages of the present invention will become apparent upon study of the remaining portions of the specification and drawings.