A multi-threaded processor is a processor which is capable of executing multiple program threads alongside one another. The processor may comprise some hardware that is common to the multiple different threads (e.g. a common instruction memory, data memory and/or execution unit); but to support the multi-threading, the processor also comprises some dedicated hardware specific to each thread.
The dedicated hardware comprises at least a respective context register file for each of the number of threads that can be executed at once. A “context”, when talking about multi-threaded processors, refers to the program state of a respective one of the threads being executed alongside one another (e.g. program counter value, status and current operand values). The context register file refers to the respective collection of registers for representing this program state of the respective thread. Registers in a register file are distinct from general memory in that register addresses are fixed as bits in instruction words, whereas memory addresses can be computed by executing instructions. The registers of a given context typically comprise a respective program counter for the respective thread, and a respective set of operand registers for temporarily holding the data acted upon and output by the respective thread during the computations performed by that thread. Each context may also have a respective status register for storing a status of the respective thread (e.g. whether it is paused or running). Thus each of the currently running threads has its own separate program counter, and optionally operand registers and status register(s).
One possible form of multi-threading is parallelism. That is, as well as multiple contexts, multiple execution pipelines are provided: i.e. a separate execution pipeline for each stream of instructions to be executed in parallel. However, this requires a great deal of duplication in terms of hardware.
Instead therefore, another form of multi-threaded processor employs concurrency rather than parallelism, whereby the threads share a common execution pipeline (or at least a common part of a pipeline) and different threads are interleaved through this same, shared execution pipeline. Performance of a multi-threaded processor may still be improved compared to no concurrency or parallelism, thanks to increased opportunities for hiding pipeline latency. Also, this approach does not require as much extra hardware dedicated to each thread as a fully parallel processor with multiple execution pipelines, and so does not incur so much extra silicon.
A multi-threaded processor also requires some means for coordinating the execution of the different concurrent threads. For example, it needs to be determined which computation tasks are to be allocated to which threads. As another example, a first one or more of the concurrent threads may contain a computation that is dependent on the result of a computation by one or more others of the concurrent threads. In this case a barrier synchronization needs to be performed to bring the threads in question to a common point of execution, so that the one or more first threads do not attempt to perform these dependent computations before the one or more other threads perform the computations upon which they are dependent. Instead, the barrier synchronization requires the other thread(s) to reach a specified point before the first thread(s) can proceed.