Data processors process ever larger amounts of data that require significant storage capacity. Large data stores such as memory take significant time to access and therefore to improve performance, local data stores that are smaller and faster to access such as caches are provided.
These caches are fast to access and improve processor speeds, however, they are costly to implement in power and area and it is therefore important that they store items that it is likely that the processor will require. If they do not store the required data then they simply add area and drain power without adding benefit. In effect the hit rate in these data stores is very important to processor power consumption and performance.
Various techniques have been developed to try to ensure that the caches store appropriate data and instructions. These include techniques where the data or instructions that is to be required or that it is predicted will be required is loaded into the cache in advance using spare loading cycles during processing such that when an access is requested during execution the data is already stored locally in a cache.
One such technique is the preloading of data that is performed in response to instructions from a programmer. When programming a programmer may recognise that a block of data may be required by subsequent code, and instructions to preload the data can be written into the code such that the data is present within the cache when it is required. These preload instructions translate into preloading requests that are sent by the processor to the load store unit or in some cases to a separate preload unit. An address translation is performed and the address is looked for in the cache and if it is not present a linefill request is sent to the load store unit to fill a line in the cache with the data, such that when the processor later requires the data it is present in the cache.
Data may also be prefetched using a prefetch unit. This is a more aggressive technique than preloading where data accesses or instruction accesses are monitored within the load store circuitry and patterns identified and future accesses predicted from these patterns. A disadvantage of the prefetching of data is that the patterns are often identified deep within the load store unit. The load store unit is a complex device that processes data requests and ensures that the ordering of the requests is upheld where required such that data hazards are avoided. Generating additional prefetch requests within the load store circuitry requires the ordering control circuitry to monitor these requests and to ensure that hazards do not arise because of them. Furthermore, it also takes up the resources of the load store unit. The load store unit is a complex device whose resources are valuable such that their use for prefetching may impact performance of the device, which in turn will affect the performance of the processing apparatus. Furthermore, the complexity of the load store unit makes it difficult to validate and adding additional data requests to this device is a potential source of hazards and may require additional validation.
It would be desirable to be able to load data that may be required into a cache in advance without adding too much additional hardware and without requiring complicated validation procedures.