This disclosure relates to prefetching in data processing circuitry.
Many examples of data processing circuitry make use of prefetching. For example, where a set of data elements being accessed by a processor is too large to fit into a cache memory such as an on-chip cache, the data elements are prefetched to the cache memory so as to be ready for use by the processor at the appropriate time. In another example, even if the set of data elements can fit in the cache, prefetching can be used to load the required data elements into the cache so that they are ready for use in a cache memory which is potentially quicker for the processor to access. This implies that ahead of the time at which the data element will be required, a prediction has to be made so that the correct data element can be prefetched.
One example technique for determining which data element to prefetch is a so-called offset technique, sometimes referred to as a “best offset” prefetching technique. Examples are disclosed in “Best Offset Hardware Prefetching”, Michaud et al, International Symposium on High-Performance Computer Architecture, March 2016, hal-01254863, the contents of which are hereby incorporated by reference. In such techniques, a detection may be made of a frequently occurring offset (in terms of a difference in memory address) between successively accessed data elements. If the latency of fetching is such that a next-required data item might not be fetched in time, and offset equivalent to a multiple of the difference can be used so that, for example, in response to accessing a particular data item, a next-but-one data item in the sequence is initiated for prefetching. In other examples, if there are two or more interleaved patterns of access, either multiple offset can be detected or an offset used which is a multiple of both offsets (or of each such offset). The detected offset is applied as a prefetch offset, so that in response to an access to a data element at a particular address X, the prefetch circuitry will initiate prefetching of [X+(current best offset)].
A processor operating with accurate and timely prefetching will generally provide a higher performance than one without. However, inaccurate or incorrect prefetching can be a net drain on performance, in that it is generally considered better to operate without prefetching than to prefetch the wrong data. This is because incorrect prefetching uses significant memory access resources and can also “pollute” the cache by populating it with incorrect data and possibly evicting correct or useful cached data.