1. Technical Field
This disclosure relates generally to processors, and, more specifically, to prefetching by processors.
2. Description of the Related Art
In various computer architectures, processing cores can typically perform operations on operands many times faster than such operands can be accessed from the memory hierarchy associated with the cores. To mitigate the effect of memory read latency, certain processor instruction set architectures (ISAs) include instructions that cause data to be retrieved from memory and stored locally in a cache if the cache does not already hold the data. For example, the “PLD” instruction in the ARM V7 ISA will cause data to be prefetched from memory and stored in the cache if the cache does not include a copy of data for that memory address. If the data is in the cache, however, execution of the PLD instruction will not cause a memory access for the data; instead the instruction is turned into a “no operation” (NOP).
In many systems that include a data cache, data flowing between processing blocks via shared memory is not checked against the data cache, and thus is not coherent. Accordingly, the shared memory is typically allocated from a pool of non-cacheable memory. The non-cacheability of this data, however, makes instructions such as the PLD instruction ineffective. The reduced ineffectiveness of such instructions is problematic, particularly in certain image processing applications that operate on a large number of pixels that are local relative to a given pixel.