It is known to provide multi-processing systems in which two or more processing units, for example processor cores, share access to shared memory. Such systems are typically used to gain higher performance by arranging the different processor cores to execute respective data processing operations in parallel.
To further improve speed of access to data within such a multi-processing system, it is known to provide each of the processing units with at least one local cache structure in which to store a subset of the data held in the shared memory. Such local cache structures can take a variety of forms, for example a data cache used to store data processed by the processing units, an instruction cache used to store instructions for execution by the processing units, a translation lookaside buffer (TLB) used to store page table information used when translating virtual addresses issued by the processing unit to physical addresses, etc.
Within a multi-processing system, applications may be migrated from one processing unit to another. As a result, there is the possibility that data used by an application when executing on one processing unit may remain cached in the local cache structure of that processing unit after the application has been migrated to another processing unit. Whilst it is known to provide coherency mechanisms to keep track of data retained in the various local cache structures, with the aim of ensuring that a processing unit will always access the most up-to-date version of the data, instances can still arise where operations performed on one or more entries of a local cache structure may not cause corresponding operations to be performed on data held in a local cache structure of another processing unit, when the performance of such operations would be appropriate. One example of such an instance is the performance of cache maintenance operations.
Often, cache maintenance operations are issued by an operating system to update the state of one or more entries in the local cache structure. If the operating system is not fully aware of the plurality of processing units provided by the data processing apparatus, as for example may be the case if the operating system is a mono-processor operating system shielded from the hardware platform by a hypervisor software layer, then it may issue a cache maintenance operation which will only be performed in respect of the local cache structure associated with the processing unit on which the operating system is running, even though data to which that cache maintenance operation would be applicable may be stored in the local cache structure of another processing unit. Purely by way of example, consider the situation where the cache maintenance operation identifies that any cached entries for a particular address range, or for a particular process identifier (process ID), should be invalidated. When that operation is performed in respect of the local cache structure of the processing unit on which the operating system is currently running, then such a cache maintenance operation will correctly invalidate any entries cached in that local cache structure that fall within the specified address range, or are associated with the specified process ID. However, no action will be taken in respect of the data held in a corresponding local cache structure of any of the other processing units. As mentioned earlier, these may in fact still retain data that was intended to be the subject of such a cache maintenance operation, but due to the operating system not being aware of the hardware architecture, those entries will not be subjected to the cache maintenance operation.
Whilst this problem will not only occur when hypervisor software is used, the problem is often likely to occur when a hypervisor software layer is used. In particular, the multi-processing system may execute hypervisor software to support the execution of at least one virtual machine on the processing circuitry, each virtual machine comprising an operating system running one or more application programs. In such an environment, both the operating system and the one or more application programs need have no knowledge of the underlying hardware platform, and in particular will not necessarily be aware that a multi-processing system is being used. The application programs and/or the operating system may hence issue cache maintenance operations that assume a mono-processor environment, and hence are likely to give rise to the earlier-mentioned problem.
One way to address such a problem would be for the hypervisor to perform a variety of cache maintenance operations at the time the operating system and/or applications are migrated (also referred to herein as “switched”) from one processing unit to another. For example, the hypervisor could extensively perform data cache clean and invalidate operations, instruction cache invalidate operations, TLB invalidate operations, etc before the switched operating system and/or application program is allowed to begin operation on the new processor core. However, whilst such an approach would address the problem, it significantly impacts performance, and in particular prevents the potential benefits of using a multi-processing platform from being realised.
An alternative approach might be for the hypervisor software, when migrating the operating system and/or applications from a source processing unit to a destination processing unit, to mark the source processing unit as being likely to be a target for some of the operations that will later be performed on the destination processor. The hypervisor would then have to further program the destination processor so as to trap any such operations when they are encountered, so that the hypervisor will then get notified when such operations are issued. At that time, the hypervisor software would then decide whether such operations also need to be performed on the source processor as well as the destination processor. However, a significant drawback of such an approach is the need to trap operations performed on the destination processor. This gives rise to a significant performance penalty, because the hypervisor software is called more often than required. In particular, if the trapping functionality is not designed on a fine grain basis, the hypervisor software may be called for a lot of operations where no action is required in connection with the source processor. There is also a significant complexity issue, as the hypervisor software needs to understand the operations in order to decide if they need to be performed on the source processor as well as the destination processor, or not.
Accordingly, it would be desirable to provide an improved technique for handling access operations issued to local cache structures within a data processing system having a plurality of processing units, each of which have such a local cache structure.