Modern processing systems frequently use high speed data caches to decrease the time required to transfer data from main memory to associated processing devices. In these systems, a block of data or "cache line" is prefetched from main memory and loaded (encached) in a small high speed data memory or "data cache." A cache line can have a length of one or more operands as desired. The particular data block retrieved from main memory is one which the processor has determined most likely includes particular data that the processor will need for an upcoming operation. If the necessary data is found in the encached data block at the time of the operation, the processor need only access the data from the faster cache memory rather than the larger and slower main memory. The resulting reduction in data access time helps reduce the execution time of the programs being run by the processor.
One of the most significant factors determining the effectiveness of a particular cache design is the percentage of loads which are satisfied by the cache. This percentage is typically referred to as the cache hit rate. As the cycle time of processors becomes faster, and the number of processor cycles needed to access main memory increases, the cache hit rate becomes the dominant factor in determining cache effectiveness. Thus, because of the utility of data caches in improving system operation speed, substantial efforts have been made to improve cache effectiveness through improved cache hit rates.
One common method for improving cache hit rates has been the use of long cache lines. In this method, when a request for data by the processor is not satisfied by the encached data, the processor issues a read to main memory for a block of data that contains the desired datum. The entire retrieved block, which typically consists of data from a number of adjacent main memory locations, is then stored in the cache. By encaching larger data blocks from a number of locations (i.e. longer cache lines) the chance that subsequent requests for nearby data will be satisfied by the cache is improved. Unfortunately, the use of long cache lines significantly increases amount of data which must be transferred from main memory, thereby consuming more memory bandwidth. Further, long cache lines are ineffective when requests for data do not exhibit spatial locality (i.e. the data needed by the processor are not all stored in nearby locations in the memory address space and therefore cannot always be accessed in a single cache line).
A simple sequential hardware prefetch is a common method for improving cache hit rates. In this method, when a request is issued for the last entry in a cache line stored in the cache memory, the next sequential cache line is read from main memory and encached. While this method is very effective at improving the cache hit rate for sequences of memory requests for data with spatial locality, it does not improve the hit rate for request sequences which call for data stored in widely spaced memory locations.
Software prefetching is another method for improving cache hit rates. In this method, a cache load instruction is defined which transfers a block of data from memory to the cache. The address of this block of data is computed by the processor using a sequence of instructions, and therefore is not restricted to being spatially adjacent in the processor address space to a previous load address. While software prefetching is effective at improving the cache hit rate for loads of non-sequential sequences of data, it is subject to several drawbacks. First, performance overhead is required to execute the additional instructions needed to compute the prefetch address and execute the prefetch load instruction. The additional overhead can offset any performance gain achieved by improved cache hit rate. Although this overhead can be reduced by providing the processor with the ability to execute multiple instructions each cycle, this capability is expensive in terms of processor hardware. Second, the software prefetching technique consumes general purpose registers within the processor. This increases the chance of register spills, with any register spills only further slowing system operation.
Thus, the need has arisen for apparatus, systems and methods for improving data cache hit rates. Such apparatus, systems and methods should eliminate the problems associated with current systems which rely on spatial locality when prefetching cache lines. In particular, such apparatus, systems and methods should allow the retrieval of cache lines located at widely separated or non-sequential locations in the processor address space. Such apparatus, systems and methods should employ a minimum of hardware, should minimize the amount of processing overhead required, and should provide access to widely spaced and non-sequential cache lines.