It is known to provide multi-processing systems in which two or more master devices, for example processor cores, share access to shared memory. Such systems are typically used to gain higher performance by arranging the different processor cores to execute respective data processing operations in parallel. Known data processing systems which provide such multi-processing capabilities include IBM370 systems and SPARC multi-processing systems. These particular multi-processing systems are high performance systems where power efficiency and power consumption is of little concern and the main objective is maximum processing speed.
It should be noted that the various master devices need not be processor cores. For example, a multi-processing system may include one or more processor cores, along with other master devices such as a Direct Memory Access (DMA) controller, a graphics processing unit (GPU), etc.
In order to control access to various shared resources within the shared memory, it is known to use semaphores. A semaphore is a protected variable or abstract data type used to restrict access to a shared resource in a multi-processing system environment. The value of a particular semaphore identifies the status of a shared resource, and through the use of such semaphores it is possible to maintain an ordering between certain data processing operations being performed by one master device and other data processing operations being performed by another master device.
For example, if one master device is being used to perform one or more setup operations resulting in the storing of certain data values in memory which will then be used by another master device, a semaphore may be provided in memory which is set by the first master device when it has completed its setup operations, with the other master device polling that semaphore, and only proceeding to perform its associated data processing tasks once the semaphore has been set. In other examples, particular semaphores can be directly associated with particular data structures in memory, for example particular data values, buffers, etc. The semaphores can take a variety of forms, from a simple lock value which is either set or unset, to counter values where each counter value indicates a different state of the associated shared resource. In other examples, the semaphore can take the form of a pointer. For example, considering a shared resource in the form of a circular buffer, one master device producing data to be stored in that buffer may maintain a write pointer as a semaphore, whilst another master device acting as a consumer of the data in that buffer may maintain a read pointer as a semaphore. Both master devices may use the values of the read and write pointers to control their operations, so as to ensure that the actions of both master devices stay synchronised.
It will be appreciated from the above discussion that, dependent on the value of a semaphore accessed by a particular master device, that master device may determine that it cannot currently perform the data processing tasks that it is waiting to perform, and often in such situations the master device may determine that it cannot do anything except wait for the semaphore value to change to a value that allows it to continue processing. For example, if a processor is handling an interrupt from a device, it may need to add data received from the device to a shared queue. If another processor is removing data from the queue, the first processor must wait until the queue is in a consistent state and the associated semaphore (typically a lock) has been released before it can set the semaphore and add the new data to the queue. It cannot return from the interrupt handler until the data has been added to the queue, and hence it must wait.
In circumstances such as these, it is known to provide a “spin-lock” mechanism. In accordance with this mechanism, a master device requiring access to a protected shared resource will continually poll the semaphore until the semaphore has a value which indicates it can proceed. Typically, the master device will then set the semaphore to a value indicating it now has access to the shared resource.
In a multi-processing system where low power consumption is desirable, this continual polling of the semaphore is undesirable because energy is consumed to no useful effect, and additionally in systems implementing multi-threading, the execution of spin-locks by waiting threads may be highly detrimental to performance.
As discussed in commonly assigned U.S. Pat. No. 7,249,270, the entire contents of which are hereby incorporated by reference, it is known to provide a master device with the ability to execute a wait for event operation, whereby the master device enters a power saving mode until an event is received at an event input port of that master device, whereupon the master device wakes up. In the above described patent, this process is used as mechanism for conserving energy in the above-mentioned spin-lock situation. However, if a master device enters such a power saving mode of operation, it is clearly important to provide a mechanism to enable that master device to be woken when the semaphore value changes, and hence the master device may be able to continue its operation. In U.S. Pat. No. 7,249,270, this is achieved by arranging for the other master devices in the system to issue a notification over a dedicated path when they update the value of a semaphore, for example when they clear a semaphore lock to identify that they no longer require access to a particular shared resource. Such notifications are issued by the master devices via performance of a send event operation (in U.S. Pat. No. 7,249,270, such send event operations being performed by execution of a send event (SEV) instruction). A number of problems can arise with such an approach.
Firstly, software programming errors may fail to take account of write latencies within the data processing system, due for example to the use of write buffers within master devices. This may result in a first master device that is releasing the lock for a particular shared resource issuing a send event to a second master device which is currently in the power saving mode, with that second master device then waking up and polling the lock before the state of the lock has actually been changed, due to a delay incurred by draining of the write buffer. In that scenario, the second master device that has woken up and repolled the lock will see that the lock is still set, and will then go back to sleep, but will not receive any further send event notification. Typically the second master device will eventually wake up due to other stimulus, for example receipt of an interrupt, but this problem can clearly give rise to a significant performance impact by causing master devices to spend more time offline than necessary.
Another problem is that such an approach assumes that all of the master devices capable of updating semaphores have also been provided with SEV capability, and hence will issue send event notifications when updating semaphores in order to wake up any master devices that are currently in a power saving mode following performance of a wait for event operation. Whilst this may not be a problem in a symmetric multi-processing system, where multiple (typically identical) processor cores share access to memory, it is more likely to be an issue in heterogeneous systems where the various master devices may be different, for example one or more processor cores, a DMA engine, a GPU, etc.
Another issue is that the send for event notification generated in response to execution of an SEV instruction is typically broadcast globally to all of the other master devices, potentially causing spurious wake up of those master devices, resulting in unnecessary power consumption incurred by those master devices waking up, repolling the semaphore of interest to them, determining that there is no change of status in their semaphore, and then re-entering the power saving state.
Accordingly, it would be desirable to provide an alternative mechanism for waking up a processor that has entered a power saving mode following performance of a wait for event operation, which does not suffer from the above disadvantages associated with the use of SEV instructions.
Entirely separate to the issue of semaphore usage within multi-processing systems, it is known to provide one or more of the master devices with their own local cache in which to store a subset of the data held in the shared memory so as to improve speed of access to data within such multi-processing systems. Whilst this can improve speed of access to data, it complicates the issue of data coherency. In particular, it will be appreciated that if a particular master device performs a write operation with regards to a data value held in its local cache, that data value will be updated locally within the cache, but may not necessarily also be updated at the same time in the shared memory. In particular, if the data value in question relates to a write back region of memory, then the updated data value in the cache will only be stored back to the shared memory when that data value is subsequently evicted from the cache.
Since the data may be shared with other master devices, it is important to ensure that those master devices will access the up-to-date data when seeking to access the associated address in shared memory. To ensure that this happens, it is known to employ a cache coherency protocol within the multi-processing system to ensure that if a particular master device updates a data value held in its local cache, that up-to-date data will be made available to any other master device subsequently requesting access to that data. The above type of multi-processing system, where multiple master devices share data, use caches to improve performance, and use a cache coherency protocol to ensure that all of the master devices have a consistent view of the data, is often referred to as a coherent cache system.
The use of such cache coherency protocols can also give rise to power consumption benefits by avoiding the need for accesses to memory in situations where data required by a master device can be found within one of the caches, and hence accessed without needing to access memory.
In accordance with a typical cache coherency protocol, certain accesses performed by a master device will require a coherency operation to be performed. The coherency operation will cause a coherency request to be sent to the other master devices identifying the type of access taking place and the address being accessed. This will cause those other master devices to perform certain coherency actions defined by the cache coherency protocol, and may also in certain instances result in certain information being fed back from one or more of those master devices to the master device initiating the access requiring the coherency operation. By such a technique, the coherency of the data held in the various local caches is maintained, ensuring that each master device accesses up-to-date data. One such cache coherency protocol is the “Modified, Owned, Exclusive, Shared, Invalid” (MOESI) cache coherency protocol.
Cache coherency protocols will typically implement either a write update mechanism or a write invalidate mechanism when a master device seeks to perform a write operation in respect of a data value. In accordance with the write update mechanism, any cache that currently stores the data value is arranged to update its local copy to reflect the update performed by the master device. In accordance with a write invalidate mechanism, the cache coherency hardware causes all other cached copies of the data value to be invalidated, with the result that the writing master device's cache is then the only valid copy in the system. At that point, the master device can continue to write to the locally cached version without causing coherency problems.
Within such coherent cache systems, it is typically the case that local caches continue to service coherency requests even if the associated master device is idle or in a power saving state. For example, the Powerpoint presentation “ARM11 MPCore and its impact on Linux Power Consumption” by J Goodacre, ARM Limited, available on the internet at http://elinux.org/images/7/76/MPCore_and_Linux_Power.pdf describes a multi-processing system where a processor can perform a wait for event operation to enter a power saving state, but with the associated local cache still able to service coherency requests.