In such systems, each processor has its own cache memory to store calculation data, such as memory addresses pointing to instructions to execute. Since the processors operate in parallel, they share some of these data. Thus, several processors may have read or write access to the same datum in order to possibly modify it and thus execute a computer program.
To ensure that the data used by the processors are updated, and prevent two processors from processing two different versions of the same datum, “cache coherence” algorithms are implemented. The MESI algorithm is an example of one such algorithm.
The implementation of cache coherence algorithms necessitates a large number of communications among the processors so that they can know the location of a datum at all times. It is a matter of determining the identification of the cache memory in which the datum is found, as well as its state.
The state of a datum in cache memory depends on the protocol used. In the example of the aforementioned MESI protocol, a datum is in “Modified” (M) state if the datum is present in only one cache memory and this datum has been modified in relation to the datum present in non-cache memory (from which the initial datum comes). In this case, a processor that wants to access the datum may, for example, wait until the datum is made consistent with the version in memory. Certain cache coherence protocols, however, may permit modified data to be transferred directly from cache to cache.
In the same example again, a datum is in the “Exclusive” (E) state if it is present in only one cache memory and if this datum indeed corresponds to the version in non-cache memory. A datum is in the “Shared” (S) state if it is present in several cache memories. Finally, a datum is in the “Invalid” (I) state if it is not up to date. It must then be ignored by the processors and not be used.
There are other protocols with more or fewer defined states. For example, the MSI protocol includes only the three states M, S and I defined above, while the MOESI protocol adds an “Owned” (O) state, equivalent to the “S” state but where the memory is not up to date.
Most cache coherence protocols use lists, or directories, indicating the history of requests made on each datum. This is called a “directory-based” protocol.
Each processor maintains a list which, for each cache line, indicates the processors in which is stored the datum recorded there, as well as its state. The information contained by this list may be more or less complete.
By using this list, a history of the requests from the processors concerning a datum can be kept with the processors. In particular, this permits filtering cache queries, while preventing, for example, querying the cache of a processor which has not manipulated a datum. In addition, if a datum does not appear in the processors' lists, it may be deduced that the datum is not in the process of being used by a processor and that it is thus stored in memory (non-cache) and is up to date.
FIG. 1 schematically illustrates a multiprocessor system.
The system includes four modules 100, 110, 120, 130. Each module includes a plurality of processors. Module 100 includes two processors 101 and 102. Module 110 includes two processors 111 and 112. Module 120 includes two processors 121 and 122. Module 130 includes two processors 131 and 132. Each processor has a respective cache memory. These cache memories are not represented.
The number of modules in the system and the number of processors in the modules are provided for illustrative purposes only. The modules may contain different numbers of processors.
In order to manage communications among the different processors, including to manage cache coherence, each module 100, 110, 120, 130 has a respective proxy module 103, 113, 123, 133. In the interest of clarity, the interconnections among the proxy modules are not represented.
Thus, each processor has a unique interface to communicate with the other processors. All of this happens as though each processor were addressing only one other processor at a time. In particular, the proxy modules maintain directories for the processors of their respective modules. The proxy modules may also maintain a directory for processors other than their respective modules.
The use of proxy modules proves desirable when the system has a large number of processors.
However, the directories maintained by the proxy modules are not identical to those which may be maintained by the processors. In fact, whenever a certain number of processors is present in a module, the size of the directory becomes too large. Notably, the space of the proxy module on the silicon would become a problem.
The proxy modules thus maintain a particular type of directory described as “inclusive.” This involves maintaining a list containing only the addresses pointing to a valid datum, i.e., in the example of the MESI protocol, data in the “modified,” “shared” or “exclusive” state, If an address points to an “invalid” datum (using the example of the MESI protocol again), it is not listed in the directory, and the processor wishing to access it must then query the non-cache memory.
Thus, the directories of proxy modules necessitate regular and frequent updates in order to add new addresses to them, such as when a processor needs read access to them. The size of the directories being limited, this update thus necessitates removing a previously stored address.
This deletion of an address in the directory is called “eviction.” This operation can then be designated with the acronym “BINV” (back invalidation).
The eviction operation poses problems in more than one regard.
In particular, problems occur when the data manipulated are computer program instructions. When such programs are executed, the processors carry out preloading of instructions, or “prefetch.” This involves loading an instruction whose execution has not yet been requested, but where the processor knows or guesses that the instruction will soon be called on.
Thus, if an eviction operation concerns an address pointing to a prefetched instruction, the processor is forced to wait until it is loaded again, which delays execution of the program (even though originally, prefetch was supposed to accelerate execution).
The eviction operation may also pose a problem when the address points to a datum and not an instruction.