Multiple thread processors are known. Each thread comprises a sequence of instructions generally directed to performing a particular self-contained operation or function particular to that thread. Threads can be executed in sequence or in parallel depending on the architecture of the processor, and can be scheduled or descheduled depending on the operations and functions that they are intended to achieve. Problems arise in multi-threaded processors, in particular that it is difficult to guarantee the performance of any particular thread.
In multi-threaded processes, it is either necessary to reserve processor cycles for each thread, even for threads which may have nothing to do, or to provide extra resources such as more memory access ports so as to make better use of reserved cycles. For example, in a conventional instruction fetch scheme there is a queue of instruction fetches and another queue of data accesses to the memory. If the instruction fetches are prioritised, a pipeline executing them may have to be stalled. If the data accesses are prioritised, an individual thread may be delayed by several cycles while waiting for its next instruction. In either case, the performance of a thread is unpredictably affected by other threads. This problem can be overcome by using dual ported memories (which are however expensive and power hungry), or having independent program and data memories (with the corresponding overhead).
In the past, multi-threaded processes have been used to hide delays in memory access, and so there has not been a real demand to improve real time performance of individual threads because that is not critical in that particular case.
One of the challenges facing processor designers is the handling of an ever-increasing number of external devices which wish to communicate with the processor. Generally this is done by providing some kind of interrupt handling capability for the processor for handling activity arising at ports connected to external devices. Increasingly, more sophisticated interface logic is used at these ports to deal with, for example, multiple external devices per port.
Interfacing is needed in a wide variety of different contexts. One context which is discussed herein by way of a background example is in mobile applications processing.
FIG. 1 shows an exemplary application of a mobile applications processor 2. The applications processor 2 comprises a CPU 4 and a plurality of interface controllers 6 which interface with a plurality of peripheral devices 8. The interface controllers include: a memory controller 6a for interfacing with a hard-drive (HDD) 8a and a SDRAM memory 8b; a video controller 6b for interfacing with a camera 8c; a display controller 6c for interfacing with an LCD display 8d; an audio controller 6d for interfacing with a microphone 8e, speaker 8f and headset 8g; and a connectivity controller 6e for interfacing with a keyboard 8h, a Universal Serial Bus (USB) device 8i, a Secure Digital (SD) card 8j, a Multi-Media Card (MMC) 8k, and a Universal Asynchronous Receiver/Transmitter (UART) device 8l. The interface controllers 6 are typically connected to the CPU 4 via a bus 3. The system also comprises a power controller 10 and radio processor 12.
Note that the interface controllers 6 are shown somewhat schematically, but represent generally some kind of dedicated I/O logic or specially configured ports.
Conventionally, external interfacing is achieved either using interrupts or by polling. When interrupts are used, an external peripheral device sends a signal to inform the processor either that it has data ready to input to the processor or that it requires data from the processor. When polling is used, the processor continually checks the state of the device to determine whether or not it is ready to supply or accept data.
One possibility for implementing an applications processor 2 such as that of FIG. 1 is to use an Application Specific Integrated Circuit microcontroller (ASIC). ASICs are hardwired devices possibly including microprocessors dedicated to a particular application and optimised to suit that application. For a given function, they are generally cheaper and consume less power than other options. However, they are complex to design, must be pre-designed and cannot readily be reconfigured.
Another possibility is to use Field Programmable Gate Array (FPGA) devices. FPGAs are semiconductor devices that can be configured “in the field” after manufacture. To configure an FPGA, first a computer is used to model the desired logical functions, for example by drawing a schematic diagram or creating a text file describing the functions. The FPGA comprises an array of look-up tables which communicate via statically configured interconnects. The computer model is compiled using software provided by the FPGA vendor, which creates a binary file that can be downloaded into the FPGA look-up tables. This allows manufacturers of equipment to tailor the FPGA to meet their own individual needs.
In this example, the interface controllers 6 are implemented as FPGAs. This has the benefit that the manufacturer of the mobile telephone can purchase generic FPGA devices 2 and then configure them on site (i.e. “in the field”) to be specific to their desired application. The disadvantage of FPGAs however is that they are more expensive, slower and consume more power than ASICs.
In alternative examples, the whole chip 2 could be implemented in FPGA, or the chip 2 could be a general purpose processor with separate FPGA chips connected between the chip 2 and the respective peripherals 8. However, these options would be even more expensive and power-consuming—prohibitively so for most mobile phones and other consumer devices.
Some of the above difficulties can be overcome by using a multi-threaded processor where each thread is associated with a specific activity, in particular with input-output operations. Such a multi-threaded processor is described for example in our earlier U.S. application Ser. No. 11/717,623 filed 14 Mar. 2007 (our ref. 314563US/VRD), and is described more fully in the following. With such a multi-threaded processor, it is important that the performance of an individual thread can be guaranteed. Potential problems arise if, for example, all of the threads require memory accesses for data or instruction fetches at the same time, or if several input-output operations arise simultaneously. In such situations, one thread may be delayed waiting for all of the other threads to complete their accesses, or an input-output request to activate a thread may be delayed until the requests to activate all the other threads have been processed.
One way to avoid this problem is to construct a computer architecture with sufficient resources to ensure that every thread can always progress, but this would be prohibitively expensive and would be a highly redundant design.