1. Field of the Invention
The present invention relates to a data processing apparatus and method for performing a cache lookup in an energy efficient manner.
2. Description of the Prior Art
When a processing unit of a data processing apparatus, for example a processor core, is performing operations, it typically requires access to multiple data values when performing those operations. These data values may be instructions defining the type of operations to be performed, or the actual data manipulated by those operations. In the following description, both instructions and data will be referred to as data values.
Accessing data values in memory can significantly impact the performance of the data processing apparatus, since the number of clock cycles taken to access memory is relatively large when compared with the processing speed of the processing unit. Accordingly, it is known to provide one or more caches for providing temporary storage of data values for access by the processing unit when performing operations. A cache resides between the processing unit and the memory and can store a subset of the data values in memory to allow quick access to those data values by the data processing unit.
Whilst in some systems only a single cache may be provided, it is known in other systems to provide a plurality of levels of cache. Accordingly, by way of example, a processing unit such as a processor core may have a level one cache associated therewith which may be a unified cache for storing both instructions and data, or may consist of a separate instruction cache and a separate data cache. These caches are typically relatively small, but provide fast access by the processing unit to the data values held therein. If a data value required is not in the level one cache, then a lookup can be performed in another cache provided at a different cache level. Hence, for example, a unified level two cache can be provided, which will typically be larger than the level one cache and hence able to store more data values than the level one cache. Such a level two cache may be provided specifically in association with a particular processing unit, or alternatively may be shared between multiple processing units.
If a required data value is not present in either the level one cache or the level two cache, then that data value will be retrieved from memory, unless some further levels of cache are provided, in which case those further levels of cache will be accessed first to determine if the data value is present in those caches, and only if it is not will the access then be performed in memory.
A cache will typically include a data Random Access Memory (RAM) having a plurality of cache lines each of which will typically store multiple data values, and the cache will typically further include a tag RAM for storing a tag value in association with each of the cache lines. When a processing unit wishes to access a data value, it will issue an access request specifying an address, that address including a tag portion which is compared with a selected tag value in the tag RAM. A match between that tag portion and the selected tag value indicates a hit condition, i.e. indicates that the data value the subject of the memory access request is in the cache. Thereafter, in the event of a hit condition, the required data value can be accessed in the data RAM.
The data RAM can be arranged as a plurality of storage blocks. For example, a common type of cache is an n-way set associative cache, and the data RAM of such a cache will typically have a storage block associated with each way of the cache. For speed reasons, it is often the case that the tag RAM and data RAM are accessed at the same time, such that whilst it is being determined whether the tag portion of an address matches a selected tag value within the tag RAM, the data values from a selected cache line can be accessed in preparation for a cache hit determination, such that if a cache hit condition is detected, the data values can then readily be accessed without further delay. Whilst such an approach enables high speed access, it increases the power consumption of the cache, since each separate storage block needs to be accessed. Indeed, it should be noted that the tag RAM may also include multiple storage blocks, and hence for example may include a separate storage block for each way. Accordingly, considering the example of a four way set associative cache, it will be appreciated that a cache lookup procedure as described above will involve accessing at least one tag storage block and four data storage blocks (and possibly up to four tag storage blocks and four data storage blocks if a tag storage block is provided per way).
One known way to seek to reduce the power consumption of the cache is to seek to detect sequential accesses directed to the same cache line and upon detecting such sequential accesses to reduce the number of storage blocks activated in the cache to service the cache lookup. By spotting that an access is sequential to, and in the same line as, a previous access, it is then possible to avoid doing a fill lookup in the cache, and in the best case it may merely be necessary to activate a single data RAM storage block, thereby significantly decreasing power consumption for such accesses.
It has been found that such an approach works particularly well with instruction caches where there is a high proportion of sequential accesses.
However, there are a number of situations which can reduce the benefits achievable by such a scheme by removing the sequentiality of accesses as observed by the cache. As an example, a processing unit may be arranged to execute a plurality of execution threads, and each execution thread will form a separate source of access requests to the cache. Typically, the processing unit may alternate between executing each execution thread, and as a result even if each independent thread is issuing access requests that are sequential, they will not be observed as such by the cache, which will typically handle an access request from one thread, followed by an access request from another thread. As a result, in such circumstances, the benefits achievable by the earlier described approach are significantly reduced.
Another example where the same issue arises is if the cache being accessed is a unified cache for storing instructions and data, and the processing unit is alternating between instruction access requests and data access requests (which may be from the same thread or a different thread). As another example, the cache may be a system level cache which is accessible by multiple processing units, and the access requests issued by one processing unit may be interleaved with access requests issued by a different processing unit. In such situations, even if one processing unit is issuing sequential access requests, they will not be observed as such by the system level cache.
Accordingly, it would be desirable to provide an energy efficient technique for accessing a cache in situations where a data processing apparatus having at least one processing unit provides a plurality of sources from which access requests are issued to the cache.