It is known to provide multi-processing systems in which two or more processing units, for example processor cores, share access to shared memory. Such systems are typically used to gain higher performance by arranging the different processor cores to execute respective data processing operations in parallel. Known data processing systems which provide such multi-processing capabilities include IBM370 systems and SPARC multi-processing systems. These particular multi-processing systems are high performance systems where power efficiency and power consumption is of little concern and the main objective is maximum processing speed.
To further improve speed of access to data within such a multi-processing system, it is known to provide each of the processing units with its own local cache in which to store a subset of the data held in the shared memory. Whilst this can improve speed of access to data, it complicates the issue of data coherency. In particular, it will be appreciated that if a particular processor performs a write operation with regards to a data value held in its local cache, that data value will be updated locally within the cache, but may not necessarily also be updated at the same time in the shared memory. In particular, if the data value in question relates to a write back region of memory, then the updated data value in the cache will only be stored back to the shared memory when that data value is subsequently evicted from the cache.
Since the data may be shared with other processors, it is important to ensure that those processors will access the up-to-date data when seeking to access the associated address in shared memory. To ensure that this happens, it is known to employ a cache coherency protocol within the multi-processing system to ensure that if a particular processor updates a data value held in its local cache, that up-to-date data will be made available to any other processor subsequently requesting access to that data.
In accordance with a typical cache coherency protocol, certain accesses performed by a processor will require a coherency operation to be performed. The coherency operation will cause a notification to be sent to the other processors identifying the type of access taking place and the address being accessed. This will cause those other processors to perform certain actions defined by the cache coherency protocol, and may also in certain instances result in certain information being fed back from one or more of those processors to the processor initiating the access requiring the coherency operation. By such a technique, the coherency of the data held in the various local caches is maintained, ensuring that each processor accesses up-to-date data. One such cache coherency protocol is the “Modified, Exclusive, Shared, Invalid” (MESI) cache coherency protocol.
If a particular piece of data can be guaranteed to be exclusively used by only one of the processors, then when that processor accesses that data, a coherency operation will not be required. However, in a typical multi-processing system, much of the data will be shared amongst the processors, either because the data is generally classed as shared data, or because the multi-processing system allows for the migration of processes between processors, or indeed for a particular process to be run in parallel on multiple processors, with the result that even data that is specific to a particular process cannot be guaranteed to be exclusively used by a particular processor.
Whilst the use of a cache coherency protocol can be used to ensure that each processing unit accesses up-to-date data, there are still certain types of accesses which can be very complex to handle within a system having multiple processing units sharing memory. For example, if a region of the shared memory is specified as a write through region, and a write access request is made by a particular processing unit to that write through region of shared memory, then it is necessary for the memory to be updated at the same time as any update is performed in the cache associated with that processing unit originating the write access request. To perform such an update in a multi-processor system introduces a lot of hazards. To enable the correct behaviour to occur the cache control logic of the associated local cache requires additional logic which increases its complexity and/or the introduction of significant delays in accessing the cache, in order to ensure that the update to the cache and the shared memory occurs in an atomic way. The atomic operation must be performed in its entirety without any intervening read or write operations, so as to prevent any other read or write access to the same data location whilst the update operation is taking place.
As an example of the type of hazard that can arise when handling a write access request to a write through region of shared memory, consider the situation where a first write causes a hit in the cache, and is being processed by the cache coherency logic, and so remains pending inside the processor core. Whilst that processing is taking place, a second write is issued to a location contiguous with the first write. A standard way of dealing with the second write when the first one is still pending is to merge the two accesses when applicable. This is mostly done to save power (only one write will be made to the cache when the coherency logic has finished its work) and to increase performance (the merge of the two writes allows a single “slot” to be used for two memory accesses, hence freeing some resources for some subsequent memory accesses).
However, this merging should not be done if the target of these writes is some shareable memory region, as it could cause the first write to be issued twice by the coherency logic. When the first write has been processed, and the memory updated, the second one should still be processed by the coherency logic, to at least update the memory. Since the two writes have been merged together, the second coherency action (and the second memory update) will in fact consist of the merge of the two writes, and hence the first write will be repeated to the memory. This breaks any memory ordering model, and is hence prohibited.
To prevent this double write, a first approach consists in adding some logic (hence some complexity) to prevent such merges. The other possible approach to avoid this increase in complexity consists in preventing such merges happening, even in the standard cases, the impact being on performances and power consumption.
Given the additional complexity and hazards introduced by providing such coherent write through write accesses, many systems providing cache coherent capable processors are unlikely to wish to support such behaviour, and accordingly one option is to not allow write through write accesses to shared memory. However, even if such a position is taken, there are still other types of access that introduce similar hazards. In particular, a region of shared memory may be specified as non-cacheable by a particular processor, or by a particular process running on that processor, and write accesses may be performed in relation to that non-cacheable region. It may be assumed that for a non-cacheable write access, there is no need to perform any lookup in the cache. However, if a processor uses a non-cacheable memory region, this only means that the processor itself will not allocate into the cache any data pertaining to that non-cacheable region. However, other processors in the system, including closely coupled coprocessors also able to access the local cache, may have a different view of memory and in particular a region that is viewed as non-cacheable by one processor may not be viewed as non-cacheable by another processor. Further, different memory maps may be used by different processes executing within the same processor, and accordingly it is possible that the data the subject of a non-cacheable write access request may in fact reside within a cache.
As a result, when handling a non-cacheable write access to shared memory it will typically be necessary to perform a lookup in the cache, and as a result it can be seen that the behaviour that must be handled is very similar to the earlier-discussed coherent write through write access. Hence, even if the decision is taken not to support coherent write through write accesses to shared memory, it is still necessary to provide some capability to handle non-cacheable shared write accesses. However, introducing such capability is very expensive in terms of the additional complexity introduced and/or the additional access delays incurred, particularly if, as is often the case, the actual likelihood of a non-cacheable shared write access resulting in a hit in the cache is very low.
Accordingly, it would be desirable to provide a more cost effective solution for enabling the correct behaviour for write access requests of the type that require both the cache associated with the originating processing unit and the shared memory to be updated.