1. Field of the Invention
The present invention relates to an apparatus and method for accessing data values in a cache and in particular for accessing data values in an ‘n’ way set associative cache.
2. Description of the Prior Art
A cache may be arranged to store data and/or instructions so that they are subsequently readily accessible by a processor. Hereafter, the term “data value” will be used to refer to both instructions and data. The cache will store the data value associated with a memory address until it is overwritten by a data value for a new memory address required by the processor. The data value is stored in cache using either physical or virtual memory addresses. Should the data value in the cache have been altered then it is usual to ensure that the altered data value is re-written to the memory, either at the time the data is altered or when the data value in the cache is overwritten.
A number of different configurations have been developed for organising the contents of a cache. One such configuration is the so-called ‘low associative’ cache. In an example 16 Kbyte low associative cache such as the 4-way set associative cache, generally 90, illustrated in FIG. 1, each of the 4 cache ways 50, 60, 70, 80 contain a number of cache lines 55. A data value (in the following examples, a word) associated with a particular address can be stored in a particular cache line of any of the 4 cache ways (i.e. each set has 4 cache lines, as illustrated generally by reference numeral 95). Each cache way stores 4 Kbytes (16 Kbyte cache/4 cache ways). If each cache line stores eight 32-bit words then there are 32 bytes/cache line (8 words×4 bytes/word) and 128 cache lines in each cache way ((4 Kbytes/cache way)/(32 bytes/cache line)). Hence, in this illustrative example, the total number of sets would be equal to 128, i.e. ‘M’ would be 127.
The contents of a memory address 47 associated with each data value is also illustrated in FIG. 1. The memory address 47 consists of a TAG portion 10, and SET, WORD and BYTE portions 20, 30 and 40, respectively. The SET portion 20 of the memory address 47 is used to identify a particular set within the cache 90. The WORD portion 30 identifies a particular word within the cache line 55, identified by the SET portion 20, that is the subject of the access by the processor, whilst the BYTE portion 40 allows a particular byte within the word to be specified, if required.
A word stored in the cache 90 may be read by specifying the memory address 47 of the word and by selecting the cache way which stores the word (the TAG portion 10 is used to determine in which cache way the word is stored, as will be described below). A logical address 45 (consisting of the SET portion 20 and WORD portion 30) then specifies the logical address of the word within that cache way. A word stored in the cache 90 may be overwritten to allow a new word for an address requested by the processor to be stored. Typically, when storing words in the cache 90, a so-called “linefill” technique is used whereby a complete cache line 55 of, for example, 8 words (32 bytes) will be fetched and stored.
FIG. 2 provides a schematic view of cache way 0 of cache 90. Each entry 130 in a TAG memory 115 is associated with a corresponding cache line 55 in a data memory 117, each cache line containing a plurality of data values. A cache controller (not shown) determines whether the TAG portion 10 of the memory address 47 issued by a processor (not shown) matches the TAG in one of the TAG entries 130 of the TAG memory 115 of any of the cache ways. If a match is found then the data value in the corresponding cache line 55 for that cache way identified by the SET and WORD portions 20, 30 of the memory address 47 will be output from the cache 90, assuming the cache line is valid (the marking of the cache lines as valid is discussed below).
In addition to the TAG stored in a TAG entry 130 for each cache line 55, a number of status bits (not shown) are preferably provided for each cache line. Preferably, these status bits are also provided within the TAG memory 115. Hence, associated with each cache line, are a valid bit and a dirty bit. As will be appreciated by those skilled in the art, the valid bit is used to indicate whether a data value stored in the corresponding cache line is still considered valid or not. Hence, setting the valid bit will indicate that the corresponding data values are valid, whilst clearing the valid bit will indicate that at least one of the data values is no longer valid.
Further, as will be appreciated by those skilled in the art, the dirty bit is used to indicate whether any of the data values stored in the corresponding cache line are more up-to-date than the data value stored in a memory (not shown). The value of the dirty bit is relevant for write back regions of memory, where a data value output by the processor core and stored in the cache 90 is not immediately also passed to the memory for storage, but rather the decision as to whether that data value should be passed to memory is taken at the time that the particular cache line is overwritten, or “evicted”, from the cache 90. Accordingly, a dirty bit which is not set will indicate that the data values stored in the corresponding cache line correspond to the data values stored in memory, whilst a dirty bit being set will indicate that at least one of the data values stored in the corresponding cache line has been updated, and the updated data value has not yet been passed to the memory.
In a typical prior art cache, when the data values in a cache line are overwritten in the cache, they will be output to memory for storage if the valid and dirty bits indicate that the data values are both valid and dirty. If the data values are not valid, or are not dirty, then the data values can be overwritten without the requirement to pass the data values back to memory.
The arrangement of cache 90 is shown in more detail in FIG. 3. The cache 90 has four cache ways and comprises the data memory 117 and a multiplexer 119. Because data values can be stored in any of the cache ways, additional cache access logic is also provided which enables the cache 90 to access data values from any of those cache ways. Hence, the cache access logic comprises the TAG memory 115, a comparator 111 associated with each cache way and a cache way selector 113. Whilst a single cache 90 is shown in FIG. 3 which stores both data and instructions, it will be appreciated that in some architectures, such as the so-called ‘Harvard’ architecture, two separate caches are provided with instructions and data being stored in the separate caches.
When an access request such as a read or a write is issued by the processor core, the memory address 47 of the data value to be accessed is placed on a processor address bus 54. The memory address 47 is received by the cache 90 from the processor address bus 54.
As explained above, data values are associated with memory addresses 47 and each of those data values may be accessed. In general, sequential access requests are requests which specify a data value which is in the same cache line as a data value which was the subject of the immediately preceding access request to that cache, whereas non-sequential access requests are those which specify a data value which is in a different cache line to a data value which was the subject of the immediately preceding access request to that cache.
However, in order to provide a convenient trade-off between performance and resources required to establish whether an access is sequential or non-sequential, previous ARM cores have adopted the following simple rule-set to establish whether an access request is sequential or non-sequential. Instruction accesses and data accesses are treated separately. For instruction accesses, if the instruction is a branch instruction or results in the value stored in the program counter being modified then it is assumed that the immediately following access is non-sequential, otherwise it is assumed that the immediately following access is sequential. For data accesses, if the access is the second or subsequent access of a multiple load or store instruction then it is assumed that the access is sequential, otherwise it is assumed that the access is non-sequential. However, for both instruction and data accesses, if the access is to the first word of a cache line then irrespective of the outcome of the preceding rules, it is assumed that the access is non-sequential.
If the access request is determined to be a non-sequential access request then the cache access logic is utilised to access each of the cache ways of the cache 90. Further examples of non-sequential access requests are access requests for a data value where that data value is stored in a different cache way, set or cache line to the immediately preceding data value accessed.
In these circumstances, for an example read operation, the cache access logic is employed. TAG memory 115 in each cache way receives the memory address 47. The TAG memory 115 outputs the TAG value stored at the location specified by SET portion 20 of the memory address 47 to the associated comparator 111. Each comparator 111 then compares the TAG value output from that cache way with the TAG portion 10 of the memory address 47 placed on the processor address bus 54.
The data memory 117 in each cache way also receives the memory address 47. The data memory 117 outputs the data value stored at the location specified by the SET portion 20, WORD portion 30 and BYTE portion 40 of the address 47 to the multiplexer 119.
If the TAG value and TAG portion 10 match then a hit signal (e.g. logic ‘1’) is sent to the cache way selector 113. The cache way selector 113 then indicates a cache hit on path 120 to a cache controller (not shown) and outputs a select signal to multiplexer 119. The multiplexer 119 then selects and outputs the corresponding data value onto the processor data bus 56. Hence, the processor core 10 is provided with the data value directly from the cache 90.
If the TAG value and TAG portion 10 do not match then a miss signal (e.g. logic ‘0’) is sent to the cache way selector 113. The cache way selector 113 then indicates a cache miss by supplying an appropriate signal on path 120 and the data value will be read from memory and stored in the cache 90. Hence, the processor core is provided with the data value over the data bus 56 following a delay while it is read from memory and the data value and TAG value are stored in the cache 90 which overwrites a data value and TAG value previously stored in the cache 90. As explained previously, it would be typical for a linefill of the complete cache line to be performed, where a complete cache line, including the data value indicated by the access request is read from memory and stored in the cache 90, thereby overwriting a whole cache line previously stored in the cache 90.
Should a sequential access take place then preferably the cache way selection associated with the immediately preceding data access will be retained and the multiplexer 119 continues to select that cache way and to output the data value the subject of the sequential access onto the processor data bus 56 without the need to utilise the cache access logic.
It will be appreciated that handling access requests can result in significant power being consumed by the cache 90. Consuming significant power causes cooling difficulties and complicates chip layout. In battery or low power applications, this significant power consumption also results in shorter battery life.
It is an object of the present invention to provide a technique which reduces the power consumption of a cache when responding to access requests.