1. Field of the Invention
The present invention relates to techniques for handling instruction execution within a data processing apparatus having a plurality of processing units.
2. Description of the Prior Art
In a data processing apparatus having a plurality of processing units, for example a plurality of CPUs (Central Processing Units), it is known to arrange at least a subset of those processing units to form a cluster to perform some dedicated processing activity. Often, the actual choice of processing units contained within the cluster is configurable and can change over time. Further, it is often the case that any individual processing unit will not know what other processing units are in the cluster. Typically, each processing unit within the cluster is arranged to execute a sequence of instructions in order to perform associated operations. Often there is a need at certain times for each processing unit to perform a particular operation. For example, when the cluster of processing units are arranged to form a Symmetric Multi-Processor (SMP) system, then the individual processing units need to operate with a coherent view of memory, and it is often necessary for certain cache maintenance operations and the like to be performed in each of the processing units. However, this requirement for each processing unit to perform the same operation is not only restricted to situations where coherency is an issue, and hence for example in other situations it may be desirable to arrange each of the processing units to perform the same operation, but on different sets of data values.
A problem arises in how to efficiently and effectively enable the processing units in the cluster to perform such operations. One known approach is to cause one of the processing units to execute an interprocessor interrupt routine, which results in interrupt signals being sent to the other processing units in the cluster. This will cause the processing units receiving the interrupt signal to halt their current execution, and branch to an interrupt handler which will cause those processing units to execute a specific piece of code so as to cause the required operation to be performed within each of the processing units. However, such an approach gives rise to a significant performance impact, since it requires each of the other processing units to halt their current execution and perform an interrupt handling routine. Further, such a mechanism can be very complex to implement from a software point of view, since there is a significant risk of a deadlock situation arising within the cluster of multiple processors. Such a deadlock situation could arise for example if a first processor becomes stalled waiting for a second processor to perform some action, but that second processor cannot perform that action because the second processor has reached an action point requiring it to send an interrupt to other processors.
Accordingly, it would be desirable to provide an improved technique for enabling operations to be executed by each of the processing units in a cluster.