In our European Patent Application No. EP0891588 there is described a multithreaded microprocessor and data processing management system in which a plurality of executing threads are routed between a plurality of data inputs and a plurality of data outputs via a data processing means. The data processing means has access to a data storage means and the system repeatedly determines which routing operations and which data processing operations are capable of being performed, and commences execution of at least one of the routing and data processing operations on each clock cycle.
In such a system, a memory management unit is used to fetch data either from an internal memory cache or from an external memory. Typically external memory has only a single data path and therefore a memory prearbiter is used to arbitrate between requests from different threads for data from memory.
Our European Patent Application No. EP1738259 proposes a scheme for improved memory arbitration using various metrics attached to the various threads which are executing.
Improvements to the type of multithreaded processor described above have introduced the concept of what we refer to as “superthreading.” This involves issuing instructions for more than one executing thread on a given clock cycle. At maximum, instructions for all the executing threads can be issued on a single clock cycle which, in a four thread implementation, would involve issuing four instructions per clock cycle. Such an implementation is, however, only fully utilised when all of the possible threads have instructions available and ready to run.
In some implementations of multithreaded processors, threads are provided with an instruction buffer which may, for example, hold up to eight instructions for a thread. The instruction buffer is filled by using an instruction fetch routine which is able to fetch instructions in advance of the current instruction to be used and preferably is also able to determine actions such as branch or hardware loop prediction which may result from instructions.
When multiple threads are being utilised, there are typically only one or two sources from which instruction data may be fetched. These are the instruction cache and embedded instruction random access (RAM) memory. Therefore, in a device with four executing threads, there are more threads than sources of instruction data and therefore arbitration between the threads needs to be implemented for access to Instruction data to be optimised.