The present invention relates generally to data processing systems and more specifically to prefetching data in a data processing system.
Microprocessors designed for desktop applications such as personal computers (PCs) have been optimized for processing multimedia applications such as video programs. When processing the video data, the microprocessor must create frames of decompressed data quickly enough for display of the video data on the PC screen in real time. However, it is sometimes difficult for the processor to process the data quickly enough because of long memory access latencies. Several mechanisms have been developed to remove the long memory access latencies.
One class of prefetch instructions designed to remove the long memory access latencies is the data stream touch (DST) instruction. DST instructions are classified as asynchronous because the instructions can specify a very large amount on memory to be prefetched in increments of cache blocks by a DST controller, or engine. The DST engine runs independently of normal load and store instructions. That is, the DST engine runs in the background while the processor continues normally with the execution of other instructions. DST instructions are useful where memory accesses are predictable and can be used to speed up many applications, such as for example, multimedia applications.
A DST instruction, as included in an application, includes a unit size, number of blocks, and a stride value. When a DST engine receives a DST instruction, the DST engine retrieves data to be written to the cache memory at a starting address according to the unit size, the stride value, and the number of blocks. The data is retrieved in the background quickly enough to stay ahead of the microprocessor unit (MPU). However, if an application that makes use of the DST instruction is executed in a data processing system having a longer cache line length than assumed by the programmer, then the DST instruction may generate redundant accesses to the cache if the stride value is less than the longer cache line length. The redundant accesses can cause reduced performance and extra power consumption. Therefore, there is a need to reduce the possibility of redundant prefetch accesses in systems that run applications having DST instructions.