A multi-threaded processor is capable of processing multiple different instruction sequences (or threads) simultaneously. During execution of a thread data and instructions need to be accessed from memory. Different threads may therefore need to access memory, and sometimes the same portion of memory, simultaneously. There therefore needs to be some arbitration between threads for memory access.
A multi-threaded processor typically has an instruction cache and a data cache containing the most commonly accessed data and instructions, as shown in FIG. 1. If the required data or instructions are not found in the caches then access to the memory on the memory bus must be requested. Access to the memory has to be controlled to ensure threads do not conflict with each other. For this reason, memory accesses from different threads from the instruction and data caches each have their own dedicated data path up to the memory arbiter module.
FIG. 1 is a schematic illustration of a memory access system in a multi-threaded processor in accordance with the prior art. Threads running on the processor core 10 can request data and instructions from the data and instruction caches 11, 12. The instruction and data caches each have memory management units associated with them. If the requested data or instructions are not in one of the caches, the request is passed to the memory bus. In order to arbitrate between requests from different thread the requests are routed first through a thread arbiter 13, 14, which orders the requests for that thread, and then a memory arbiter 15, which controls access to the memory bus.
Within the main memory, data is typically stored and accessible in units of a fixed number of bits, called cache lines. So, in order to read a memory address from the memory, the entire cache line containing that address must be fetched. There are two types of cache line. One type is a local cache line that only stores data for a particular thread. The other is a global cache line that stores data accessible by different threads. Whether a piece of data is stored within a global or local cache line depends on its linear address. The present invention is concerned with memory resources that are shared between threads, i.e. global cache lines.
A global cache line might store the values of software local variables entered by different threads in different word positions within the cache line. It is expected that when a thread Tx reads its local variable from the cache line it would get back its last written value. However, situations can arise when using write through data caches in which accesses by the other threads to their local variables within the same cache line cause the thread Tx to read an old and wrong value. When this happens Tx is said to have become “data incoherent”.
FIGS. 2a and 2b each illustrate an example sequence of accesses by different threads causing data incoherence on thread T0.
Referring to FIG. 2a, T0 first accesses its local variable, A, with a write request. T1 then accesses its local variable, B, with a read request. The physical addresses of A and B are such that they are cached within the same global data cache line. Initially both A and B are not in the cache.
Read requests typically take less time to reach the memory bus than write requests. In this case, the T1 read reaches the memory before the T0 write. As a result, an old value of the cache line is stored in the data cache. The T0 write request does not write to the data cache, only to the memory bus. So, subsequent reads of the cache line from the data cache will fetch the old values that are stored in the data cache as a result of the T1 read.
Referring to FIG. 2b, once again A and B are both in the same cache line, and initially not in the data cache. T1 first accesses B from the memory bus with a read request. Before B is fetched, i.e. between the time the read request leaves the data cache and the time the cache line containing B is stored in the data cache, a write request for A is issued to the memory bus from T0. Again, the write from T0 is not written to the data cache, so the data cache retains an old version of the cache line, which will be accessed by subsequent read requests.
As can be seen, when multiple threads access global cache memory from the memory bus, data incoherency can arise particularly write-through caches. This invention aims to address this problem by detecting the incoherency hazard and using a mechanism to ensure that read or write instructions are only issued out of the memory bus when it is safe to do so.