Multi-threaded processors are processors which execute multiple concurrent program threads, each thread comprising a sequence of instructions. This concurrency may be achieved by scheduling threads in an interleaved manner, for example by issuing their instructions into the execution unit according to a round robin scheme. Concurrency may alternatively or additionally be achieved by parallel execution.
Program threads may interact or communicate with one another, such that dependencies may exist between threads. In this case, it is necessary to synchronise the execution of the threads in order to bring them to a common point of execution. For example, if a first thread is to generate some data which is to be operated upon by a second, then the generation of data by the first thread must occur before the operation by the second thread.
Ensuring this is not as straightforward for a software developer as simply arranging the dependent instructions in the correct order. The reason is that the different threads may be scheduled at different times for reasons that are unpredictable or beyond the software developer's control. For example, one improved approach to scheduling threads is discussed in our earlier U.S. application Ser. No. 11/717623, our ref. 314563.US, entitled “Processor Register Architecture”, according to which a multi-threaded processor suspends the execution of threads pending specific activities such as input/output events from an external port. These external activities are typically unpredictable from the point of view of the processor.
Therefore a synchronisation scheme is required to ensure that an instruction in one thread does not attempt to execute before an instruction in another thread upon which it is dependent. This type of synchronisation is known in the art, and is referred to as “barrier synchronisation”.
However, such synchronisation schemes must typically be coded into the threads themselves by the software developer, which is burdensome for the developer and also inefficient in terms of code density because a relatively large amount of code in each thread must be dedicated to achieving the required synchronisation. Further, the synchronisation code slows the program due to the additional memory accesses required.