1. Field of the Invention
The present invention relates generally to the optimization of mixed hardware/software systems. More specifically, the invention is directed to optimizing a system to minimize power consumption, in systems where power usage is important, such as mobile computing systems, for example.
2. Description of the Related Art
Minimizing power dissipation of embedded systems is a crucial task. One reason is that high power dissipation may destroy integrated circuits through overheating. Another reason is that mobile computing devices (like cell phones, PDAs, digital cameras, etc.) draw their current from batteries, thus limiting the amount of energy that can be dissipated between two re-charging phases. Hence, minimizing the power dissipation of those systems means to increase the devices"" xe2x80x9cmobilityxe2x80x9d, an important factor for a purchase decision of such device. Due to cost and power reduction, most of those systems are integrated onto one single chip (SOC: System-On-a-Chip). This is possible through today""s feature sizes of 0.18xcexc (and smaller) that allows for integration of more than 100 Mio. transistors on a single chip. (It is noted that due to the design gap, as discussed in M. Keaton and P. Bricaud, Reuse Methodology Manual For System-On-A-Chip Designs, Kluwer Academic Publishers, 1998, current systems on a chip hardly exceed 10 Mio. transistors (not counting on-chip memory)). By 2001, even larger systems, of up to 400 Mio transistors, may be integrated onto a single chip. See 0.07 Micron CMOS Technology Ushers In Era of Gigahertz DSP and Analog Performance, Texas Instruments, Published on the Internet, http://www.ti.com/sc/docs/news/1998/98079.htm, 1998. In order to cope with this complexity, state-of-the-art design methodology deployed is core-based system design: the designer composes a system of cores, i.e. system components like, for example, an MPEG encoder engine, a standard off-the-shelf processor core microprocessor core, peripherals etc., as seen in FIG. 1b). See, for example, M. Keaton and P. Bricaud, cited above. But still, the designer has a high degree of freedom to optimize her/his design according to the related design constraints, since cores are available in different forms: as xe2x80x9chardxe2x80x9d, xe2x80x9cfirmxe2x80x9d or xe2x80x9csoftxe2x80x9d versions. For a more detailed introduction to core-based design, please refer to R. K. Gupta, Y Zorian, Introducing Core-Based System Design, IEEE Design and Test of Computers Magazine, Vol. 13, No. 4, pp. 15-25, 1997. In the case of a hard core, all design steps down to layout and routing have already been completed, and a soft core is highly flexible since it is a structural or even behavioral description of the core""s functionality. Hence, after purchasing a soft core in behavioral description, the designer may still decide whether to implement the core""s functionality completely as a software program (running on a standard off-the-shelf processor core) or as a hard-wired hardware (ASIC core). Or, the designer may partition the core""s functionality between those (hardware and software) parts. As an example, FIG. 1b) shows a system-on-a-chip with an MPEG encoder core composed of blocks like MPEG video, MPEG audio, Video Res., JPEG Acc. Under certain circumstances there might be a possibility that in terms of power dissipation of the whole system, a different hardware/software partitioning is more advantageous. Parts of these MPEG encoder engines might be run on the xcexcP core as shown in FIG. 1c).
The present invention employs a novel approach that deploys a hardware/software partitioning methodology to minimize the power consumption of a whole system (standard off-the-shelf processor core and instruction cache and data cache and main memory and application specific cores (ASIC cores) like, for example, MPEG, FFT etc.). The present invention focuses on the low power hardware/software partitioning method and uses a framework to estimate and optimize the power consumption of the other cores that are not subject to hardware/software partitioning (like main memory, caches, etc.). It is noted that those other cores have to be adapted efficiently (e.g. size of memory, size of caches, cache policy etc.) according to the particular hardware/software partitioning chosen. This is because, in case of the cache, the access patterns may change when a different hardware/software partition is used. Hence, power consumption is likely to differ.
The present invention is a totally new approach to reduce the power consumption of a system. This reduction is performed by adding hardware, such that this additional hardware is executing in mutual exclusion with the software parts that it replaces. As such, the part that is not executing can be shut down entirely and thus is not consuming energy. Also, the additional hardware is especially adapted for the specific calculations, thus it achieves a better resource utilization rate. Lastly, the additional hardware can work faster (i.e. require fewer clock cycles) and thus allow for even greater savings of energy.
According to the invention, a method of optimizing a system for power usage, where the system has a set of cores, with each of the cores having a plurality of functional units. The method includes calculating a utilization rate of each of the functional units and each of the cores, where the utilization rate of the functional units is defined as a number of clock cycles the functional unit is actually performing an operation in relationship to all clock cycles this functional unit is switched on, and the utilization rate of each of the cores is defined as the average utilization rate of all of the functional units; selecting cores from the set of cores that have a low utilization rate and dividing functions of those selected cores into partitioning objects; executing the partitioning objects and calculating utilization rates for the executed partitioning objects; comparing the utilization rates of the partitioning objects with the utilization rates of the selected cores from which the partitioning objects were extracted; synthesizing the partitioning objects for which the utilization rates of the partitioning objects is lower than the utilization rates of the selected cores from which the partitioning objects were extracted, where the synthesized partitioning objects represent new cores to be added to the system as small ASICs; and building the system using the new cores and some of the cores from the set of cores. As used herein, partitioning objects are software pieces, such as nested loops, etc., that can be executed separately, and their individual utilization rates may be compared to the utilization rate of the undivided core.
In a preferred embodiment, the method also comprises a step of determining the functional units of the cores by examining code segments that make up a set of operations of the cores. Further, the functional units may be preselecting some of the functional units that are expected to yield energy savings based on bus transfers between a main memory of the system and cores corresponding to some of the functional units.
The method takes into consideration all of said cores, caches and a main memory of said system to minimize power usage of the entire system.
The new cores in the built system are executed in mutual exclusion of the selected cores from which the new cores were extracted. Additionally, the system is optimized for energy savings (through adapted synthesis) while maintaining or increasing the performance of the system as compared to an initial, unoptimized design of the system.
In addition, a method of hardware/software partitioning of a system for low power usage having a plurality of cores is also disclosed. The method includes: determining a plurality of instruction-level functional units taken from the plurality of cores; calculating a utilization rate of each of the functional units and each of the cores, where the utilization rate of each of the cores is defined as the average utilization rate of all of the functional units; and adding additional hardware components to the system to replace certain functional units of the plurality of functional units such that an overall utilization rate of the system is minimized.
The above method may also have the additional hardware components executing in mutual exclusion with the replaced functional units, such that the replaced functional units can be shut down and not consume energy. Additionally, the additional hardware components may be especially adapted to specific calculations to achieve the minimization of the overall utilization rate of the system. Also, the additional hardware components may use fewer clock cycles than the replaced certain functional units, thus providing for additional energy savings.