1. Field of the Invention
The present invention relates to cache circuitry, a data processing apparatus including such cache circuitry, and a method for handling write access requests within such cache circuitry.
2. Description of the Prior Art
A data processing apparatus will typically include one or more data processing units which are operable to perform data processing operations on data values stored in memory. Since accesses to memory are relatively slow, and hence are likely to adversely impact the performance of the processing unit, it is known to provide one or more caches within the data processing apparatus for storing a subset of the data values so that they can be made available to the processing unit more quickly than if instead they had to be accessed directly in memory.
When a processing unit wishes to access a data value, it will typically issue an access request specifying an address in memory of the data value required to be accessed. Assuming the address specified by the access request corresponds to a cacheable memory region, a cache receiving that access request will typically be arranged to perform a lookup procedure to determine from the specified address, or at least from a portion thereof, whether the data value the subject of the access request is stored within one of the cache lines of the cache (this being referred to as a hit condition), and if so to allow the data value to be accessed in the cache. For a write access, this will involve updating the relevant data value within the identified cache line, whereas for a read access this will involve returning to the processing unit the data value as read from the identified cache line.
If on receipt of an access request, the cache determines that the data value the subject of the access request is not present in the cache (referred to as a miss condition), then the cache may be arranged to perform a linefill operation in order to retrieve into the cache a cache line's worth of data from memory, including the data value the subject of the access request, so that the data value can then be accessed directly from the cache. As part of such a linefill procedure, it will be necessary to select a cache line in which this new content is to be stored. If that selected cache line is currently storing data relating to a write through region of memory, any updates to that cache line's data will have been also made to memory, and accordingly there is no need to output the current contents of that cache line to memory before overwriting it with the new content retrieved as part of the linefill procedure. However, if the current contents of that cache line relate to a write back region of memory, it will additionally be necessary as part of the linefill procedure to evict the current cache line's contents to memory to ensure that memory is updated to reflect any changes that have been made to the current content of the cache line.
Typically, a cache will have control circuitry arranged to receive each access request issued by the processing unit and to process that access request as outlined above. Often, a number of slots are provided within the control circuitry to allow a number of access requests to be pending at any particular point in time. In particular, each slot is arranged to store attributes associated with a pending access request. Before the above-mentioned lookup procedure is performed in respect of the cache, a number of checks need to be performed to ensure that the access specified by the access request is allowed to proceed. For example, certain areas of memory may only be accessible by the processing unit when operating in a particular mode of operation. Details of each pending access request will typically be kept within the allocated slot for that access request while such checks are performed. Since such checks typically take several clock cycles, the provision of multiple slots can allow an access request to be received by the cache circuitry per clock cycle.
However, it is desirable to keep the number of slots provided small, since as each additional slot is added, the complexity of the control circuitry increases. For example, each slot will require associated circuitry to perform the above mentioned check procedures in relation to the contents of that slot, and will need additional circuitry to handle other aspects of the associated access request, for example constructing burst accesses in the event that the access request is a burst access request specifying multiple accesses. Further, hazard detection circuitry within the control circuitry becomes more complex the more slots there are within the cache, since that circuitry needs to be able to check the contents of all of the slots to ensure that hazards such as read after write hazards are prevented. Further, arbitration circuitry required to arbitrate between independent requests made by the various slots for use of cache resources also becomes more complex the more slots are provided.
In emerging processor designs, use of speculative accesses is becoming more common. Such speculative accesses can include speculative write accesses. A speculative write access request cannot be committed to the memory system until the processor subsequently confirms whether that speculative access should proceed or should fail, and hence the data cannot be stored in the cache array of the data cache and/or output to lower levels of the memory system until a signal has been received from the processing unit confirming that that speculative write access should occur. Accordingly, when a speculative write access request is allocated to one of the slots, it is likely to need to stay within that slot for a longer period of time than a standard write access request or a read access request would require. This will adversely affect the ability of the cache to receive subsequent access requests from the processing unit, which will in turn adversely affect the performance of the processing unit.
One way to seek to address this would be to add additional slots to the control circuitry of the cache. However, for the above mentioned reasons, this is undesirable due to the increase in complexity that results from each additional slot added. This additional complexity can lead to timing issues, for example due to the need to provide larger arbitration circuitry in respect of other resources within the cache such as a bus interface unit, a store buffer, etc.
One known mechanism for seeking to increase the number of write access requests that can be retained for subsequent processing without adversely impacting the performance of the processing unit is to provide a store queue on a path between the processing unit and the cache. Such a store queue typically acts as a first-in-first-out (FIFO) buffer to allow write access requests issued by the processing unit to be temporarily buffered prior to forwarding to the slots within the control circuitry of the cache. However, the use of such a store queue introduces additional complexity issues, particularly in respect of hazard detection. In particular, for a read access request pending in one of the slots of the control circuitry of the cache, an additional interface would need to be provided to enable the contents of the store queue to be analysed to ensure hazards such a read after write hazards were prevented. Additionally, another interface would need to be provided between the processing unit and the store queue to enable write access requests to be forwarded to the store queue rather than directly to the cache.
Accordingly, it would be desirable to provide an improved technique for handling write access requests within a cache, in particular when some of those write access requests may be speculative write access requests.