A cache is used to reduce the average time needed to access data or instructions from a main memory of a computing system. When a processor needs to read from or write to a location in main memory, it first checks whether a copy of that data is in the cache. If so, the processor immediately reads from or writes to the cache, which is much faster than reading from or writing to main memory. Where in the cache a copy of a particular entry of main memory will go is decided by a replacement policy. If the replacement policy is free to choose any entry in the cache to hold the copy, the cache is called fully associative. At the other extreme, if each entry in main memory can go in just one place in the cache, the cache is described as direct mapped. Many caches implement a compromise in which each entry in main memory can go to any one of n places in the cache, and are described as n-way set associative. FIG. 1 is a schematic diagram of a 2-way set associative cache 100. Each index position or line 101 in the cache (e.g. each row in the representation shown in FIG. 1) comprises two entries, one in each of the ways 102 (i.e. way 0), 103 (i.e. way 1). Each entry of a way comprises a data field 104, 106 and a cache tag field 108, 110. Where the cache 100 is used to cache values from a larger memory device (i.e. the cache is a data cache), a given memory location can be mapped to two possible locations, i.e. an entry in way 0 or an entry in way 1, with the index often corresponding to the least significant bits of the memory location and the cache tag (which is stored in an entry) corresponding to the most significant bits of the memory location. When storing data in the cache 100, if one of the entries at the required index is empty, that entry is used, but if both of the entries are filled, one of the two entries is overwritten. A replacement algorithm such as ‘least recently used’ (LRU) may, for example, be used to determine which entry to overwrite.
When reading data from such a cache 100, the data from all of the ways of the cache may be read in the same clock cycle and then the cache tags are then examined to determine which entry contains the required data item and the other data items are discarded. Alternatively, all the tags (from all the ways of the cache) may be read in a first clock cycle and then the data from the entry with the matching tag is read in a second clock cycle.
For a multi-threaded processor, the data items stored in an n-way set associative cache (such as a jump register cache or instruction cache) may be specific to a particular thread and as a result of the replacement algorithm used to determine where new data items are stored in the cache, there may be no fixed mapping between particular ways and particular threads. Consequently each entry in the cache (i.e. each entry in each way of the cache) comprises a thread identifier (or thread ID). The thread ID may be incorporated as part of the tag or may be provided in a separate field. A thread ID in a cache entry is the identifier for a thread to which the data belongs, i.e. it is the thread which requested/requires the data item (e.g. the thread for which the data was being fetched when the cache entry was written). When reading from a cache which comprises many ways (e.g. an n-way set associative cache) an entry from all of the ways of the cache is fetched and then the tag and thread ID are examined to determine which entry contains the required data item (where, as described above, in some implementations the thread ID may be incorporated within the tag field rather than being a separate field). This is power inefficient for a RAM based cache (i.e. a cache implemented as a RAM), because it is necessary to power up every way each time data is read even though one or more of the ways may comprise data which is for another thread (and hence cannot contain relevant data). Furthermore, it is data inefficient because irrelevant data for another thread is retrieved.
The embodiments described below are provided by way of example only and are not limiting of implementations which solve any or all of the disadvantages of known cache arrangements and methods of operating caches in multi-threaded processors.