1. Field of the Invention
The present invention relates to a data processing system.
2. Description of the Prior Art
Management of power consumption is a major design goal for designers of system-on-chip integrated circuits and data processing apparatuses in general. With the increased prevalence of portable data processing devices such as portable telephones, personal organisers and personal computers, careful control of power consumption is becoming more of a key factor in system design. Even in non-portable data processing devices, reduction of power dissipation and power consumption is important because it reduces the running costs, simplifies the design of cooling and power supplies and increases the reliability of operation.
There is a need for different power-performance modes of data processing devices because many power-constrained applications of processors require relatively low processor performance for the majority of the run time of the device but sometimes require considerably higher performance for relatively short periods of processing time. Data processing devices are likely to incorporate several and perhaps even tens of processors to implement a number of different processing tasks and any processors that are unused, even temporarily, during operation of the data processing device may well have both (a) a high-performance high-power consumption mode; and (b) one or more lower-performance lower-power consumption modes. This allows for the performance of the processor to be tailored to the demands of the current processing workload (i.e. operating system and one or more program applications) thus saving power when maximum processing performance is not required.
Processor power dissipation is often divided into dynamic (or switching) power and a static (or leakage) power component. The dynamic power component is associated with electrical signals changing voltage levels. In processor designs that use a typical clocked complementary metal-oxide semi-conductor (CMOS) circuit, the dynamic power is consumed when circuits are clocked or inputs change logic levels. By way of contrast, static power is consumed for the entire duration of time that power is supplied to the processing circuitry.
It is known to control power and performance of a data processing apparatus by using dynamic voltage and frequency scaling (DVFS). In the DVFS approach low energy consumption modes are entered by reducing the voltage supply to the processing circuits so that they use less power. This reduction in voltage also makes the transistors of the processing circuitry switch more slowly, which in turn means that the frequency of the processor clock should necessarily be reduced corresponding to the reduction in voltage. A simple DVFS processing device will typically have a full-voltage and full-performance point and at least one lower voltage lower performance point.
One problem with power management using DVFS is that it requires relatively complex power supply and clocking systems and this can increase the complexity of processor designs where the processing circuitry is voltage-frequency scaled but external circuitry is not or is voltage-frequency scaled by a different amount.
DVFS power management also has a limit to the lowest power point that it can provide in a data processing system. This is because transistors of the processing circuitry cannot operate below a certain characteristic minimum voltage for a given fabrication technology. This means that large high performance processors cannot be voltage-frequency scaled down to an arbitrarily low performance-power point. Furthermore, voltage scaling typically does not eliminate static power consumption. Large high performance processors that are required for the more demanding processing applications typically comprise large numbers of transistors and correspondingly have large static power consumption. Thus, it is desirable to provide a data processing system that provides performance scaling in a manner that enables high-performance processing to be achieved yet provides a lower more efficient power point at the lower performance end of the scale, and which has a simplified power supply and clocking circuitry.
It is also known to provide data processing systems having heterogeneous multiprocessors. Such heterogeneous multiprocessors are made up of at least two processors of different types, for example, one high-performance high-power processor and at least one lower performance lower power processor. Such multiprocessor systems are designed to be capable of concurrently executing two or more separate instruction streams corresponding to the individual number of processors making up the multiprocessor. Each processor's instruction stream contains instructions from the process it is currently running and when running at full performance, multiprocessor systems will typically have all of the high performance as well as all of the low power processors running substantially simultaneously. Such heterogeneous multiprocessors have operating systems specially written to handle multiprocessing comprising a scheduler serving to re-allocate processors to different processing tasks, albeit relatively infrequently.
To migrate a process between different processors of a multiprocessor system, the multiprocessor operating system is required to save all of its run-time state from a source processor to external memory and then to reload all (or a substantial portion) of that run time state on to the destination processor. It typically takes the operating system several thousand processing cycles to reload the necessary processor state. Furthermore, once the state has been reloaded there will typically be some start-up performance cost associated with the task migration on the destination processor while contents of structures like caches, translation lookaside buffer (TLB) and branch history tables adjust to the migrated instruction stream. The cost of entering the operating system and the cost of the transfer of processor state means that migration of a process has a high cost in terms of both processing time and energy in such known heterogeneous multiprocessor systems. Thus there is a requirement for more efficient migration of one or more processing tasks between different processors.
It is also known to manage processor power consumption by using fine-grain power configurable processors in which individual components of the high performance processor each have a high-performance configuration and a lower performance (but more energy-efficient configuration). In this case the overall structure of the processor remains the same and instructions travel through the processor via much the same path. For example, a high performance processor may comprise super scalar processor with multiple ALU (arithmetic logic unit) pipeline where all but one ALU pipeline can be powered down to improve energy efficiency. A further example is high power high performance data processors comprising branch predictors that can switch between highly aggressive and less aggressive speculation about the future course of the instruction stream.
In such fine-grain power configurable processors no state transfer is required to move between the high performance and the high efficiency modes. However, a disadvantage of this fine-grain power configurable processing is that the inclusion of the high efficiency mode can introduce extra transistor gates into critical paths that can in turn reduce the maximum performance of the high efficiency mode. Furthermore in the high efficiency mode signals are required to travel over the full area of the high performance processor which means that signals are required to propagate over considerable distances. This increases the signal loading and thus increases power dissipation. Furthermore reducing static power consumption in fine-grain dynamically configurable processors is problematic because it is not easy to cut off power to unused circuits in these systems due to the fact that power switching is necessarily distributed throughout the design of the high performance processor.
Thus there is the requirement for an alternative way of implementing processor performance scaling that simplifies implementation of the high performance and the high efficiency modes of operation and enables static leakage current to be reduced an a further requirement to more efficiently migrate processing tasks between processors.