British patent application no. 9607153.5 describes a multi-threaded processor and data processing management system in which a plurality of execution threads are routed between a plurality of data inputs and a plurality of data outputs via data processing means. The data processing means has an access to data storage means. The system repeatedly determines which routing operations and which data processing operations are capable of being performed and commences execution of at least one of the routing or data processing operations on each clock cycle.
The typical sub-modules for such a multi-threaded processor is shown in FIG. 1. For the purposes of this example the microprocessor core in this figure has only two execution threads.
The microprocessor core 1 issues memory requests to the memory management unit 2. When the required data is not in its local memory (i.e., a cache miss), the required data would have to be fetched from the external memory. Since the external memory has only a single data path, the memory prearbiter 3 is used for arbitrating between requests from different threads.
The simplest kind of arbitration scheme that can be used in FIG. 1 is round-robin. However, there are two main problems with such a scheme. These are as follows:
Firstly, conventional dynamic random access memories (DRAM) are organized into banks. These banks are divided into regions called pages. Generally speaking, before a location of the memory can be accessed, the relevant page has to be opened. After the access, the memory may choose to keep the current page open (open page policy) or closed (close page policy). For example, if the memory is operating on the open page policy and the pre-arbiter chooses to send a memory access which is not in the same page as the last access, a high memory cycle latency will result due to the amount of time needed to open a new page. On the other hand, if the memory is operating on the close page policy, sending a memory request in the same page as the last access would similarly result in a high latency.
Secondly, the AMA of the multi-threaded processor addresses the problem of controlling the use of processor resources such that processing requirements of all programs running on all threads are to be met. The arbitration scheme of FIG. 1 does not take into account the current status of the threads. Thus, it is possible that the operation of the AMA could be impeded. In particular, when there are periods of intense memory activity, the AMA control system could be overloaded simply due to the fact that the prearbiter 3 does not have any thread AMA information.