1. Field of the Invention
The invention relates to a method and computer program product for power estimation, more particularly to a method and computer program product for register transfer level power estimation in chip design.
2. Description of the Related Art
With the increasingly complicated functionality of chips, the number of logic gates in a logic chip also increases quickly, thereby resulting in higher power consumption of the chip. Moreover, after the logic chip is implemented, the power consumption of the chip is oftentimes found not complying with system specifications, and requires repeated modification of the design of the internal components of the chip so as to obtain a chip with lower power consumption. However, considerable costs and time have to be spent at each implementation of the logic chip. Therefore, if the power consumption of a chip can be estimated in advance by a simulation method prior to implementation of the chip, the required implementation costs can be reduced effectively.
A conventional method for simulating the power consumption of a logic chip is to estimate the power consumption of the chip at the gate-level. The gate-level circuit of the logic chip is composed of a plurality of logic gates, and the toggle count of output signals at the logic gates is correlated to the power consumption of the chip. Therefore, by compiling statistics of the switching activity of the output signal at each logic gate, the power consumption of the chip can be obtained.
The output signals at the logic gates are switched because the input signals inputted into the logic chip will change with clock cycles. Therefore, once the input signals change, the outputs of the logic gates of the logic chip will also vary.
To illustrate using an example, reference is made to FIG. 1, which shows a gate-level circuit inside a chip. The gate-level circuit includes a first NAND gate 41, a second NAND gate 42, a first NOR gate 43, a second NOR gate 44, and a NOT gate 45.
The logic chip can receive an input of four signals x1, x2, x3 and x4. The value of each of these input signals may be 0 or 1, and may vary with different clock cycles. For the sake of illustration, an input vector pi=[x1, x2, x3, x4] is used to represent the value of each of the input signals x1, x2, x3 and x4 during the ith clock cycle.
During the first clock cycle, the input vector is p1, and the value thereof is [0, 1, 1, 0]. Besides, at this stage, the output of the first NAND gate 41 is 0; the output of the second NAND gate 42 is 1; the output of the first NOR gate 43 is 0; the output of the second NOR gate 44 is 0; and the output of the NOT gate 45 is 1.
After the first clock cycle, i.e., during the second clock cycle, the input vector p1 switches to p2, and the value of p2 is [1, 0, 1, 0]. At this stage, since the input signal switches, the output of the first NAND gate 41 will switch from 0 to 1, the output of the second NAND gate 42 will switch from 1 to 0, and the output of the first NOR gate 43 will switch from 0 to 1, whereas the outputs of the second NOR gate 44 and the NOT gate 45 will not switch.
The fan-out of an input signal xj refers to the logic gates that may be affected when the input signal xj switches. As shown in FIG. 1, when the input signal x1 switches, the outputs of the second NAND gate 42 and the first NOR gate 43 may be caused to effect switching. When the input signal x2 switches, this may cause the outputs of the first NAND gate 41, the second NAND gate 42, and the first NOR gate 43 to switch. Switching of the input signal x3 may cause the outputs of the first NAND gate 41, the second NAND gate 42, the first NOR gate 43, the second NOR gate 44, and the NOT gate 45 to switch. Similarly, switching of the input signal x4 may cause the outputs of the first NOR gate 43, the second NOR gate 44, and the NOT gate 45 to switch. Therefore, when the first clock cycle switches to the second clock cycle, the logic gates of the logic chip have a total toggle count of 3. Thus, by computing the number of switching activities that occurred at the logic gates, the power consumption of the logic chip under these two input vectors can be inferred.
However, it should be noted that there are many ways of switching the input vector p1 during the next clock cycle. In particular, the input vector p2 is not limited to [1,0,1,0], and may have 16(=24) possibilities, including [0, 0, 0, 0], [0, 0, 0, 1], and [0, 0, 1, 0]. In addition, during the first clock cycle, the value of p1 is also not limited to [0, 1, 1, 0] as mentioned above. Therefore, two input vectors of any two clock cycles may have many possible combinations, and some specific combinations may enable the logic chip to execute specific functions. For instance, switching from one input vector [0, 0, 0, 1] to [1, 1, 1, 1] may represent that the logic chip is being switched to a power-save mode, whereas switching from one input vector [1, 0, 0, 0] to [1, 1, 1, 0] may represent that the logic chip is executing a logic operation, such as multiplication.
However, if the input vector switches from [0, 0, 0, 0] to [1, 1, 1, 1], and this switching activity does not activate the logic chip, such change in the input vector has no meaning for the logic chip. When estimating the power consumption of a chip, all the meaningful input vector switching activities of the logic chip have to be considered so as to obtain a value that can represent the average power consumption of the logic chip.
Although estimation of the power consumption of the chip at the gate level of the chip has a high accuracy, as the input at the gate level of the chip will have a large number of input signals and will not be having only four input signals as in FIG. 1, simulation performed at this level will be very time-consuming due to consideration of the activities of such a large number of input signals.
To describe the internal circuitry design of a chip, apart from using gate level as a basis, the register transfer level (hereinafter referred to as RT-level), a higher level, can also serve as the basis. At the RT-level, a register transfer level code (hereinafter referred to as RTL code) is used to describe the internal circuitry design of the chip. In this RTL code, an input vector qi=[x1, x2, . . . , xn-1, xn] can also be used to describe the values of all the signals x1, x2, . . . , xn-1, xn of the RTL code during the ith clock cycle.
FIG. 2 lists a typical RTL code written using Verilog programming language. The illustrative RTL code is primarily used to calculate the greatest common divisor (GCD) of two integers u, v. In the example shown in FIG. 2, the elements included in the input vector qi are a value of an input signal start, a value of a state control register state, and values of two data registers u, v.
It is noted that in the typical RTL code as shown in FIG. 2, condition decisions, such as if-then-else statements and case-switch statements, are generally included.
Reference is made to FIG. 3, which illustrates a conventional RT-level power estimation method. This method was proposed in the paper entitled “Clustered Table-Based Macromodels for RTL Power Estimation” in Proc. of Great Lake Symposium on VLSI by R. Corgnati, E. Macii, and M. Poncino in 1999. The aforesaid method includes the following steps.
In step T1, a power model is built based on the RTL code, logic circuit diagram, and characterized input vector sets of a chip. The building of the power model involves a plurality of look-up tables to record all each switching activity to be generated by the logic chip. The size of the look-up tables is determined based on the number of the input signals of the chip, and increases exponentially according to the number of input signals.
In step T2, the model built in step T1 is used to calculate, one by one, the power values to which all the meaningful input vector sets of the logic chip correspond, and an arithmetic mean of the power values thus obtained is calculated to obtain an average power value representative of the logic chip.
Since such a conventional RT-level-based method is simpler than the logic computations included in the logic gate level of the chip, it takes less computing time compared to power estimation at the gate level. However, with the advance of technology, the number of circuits within a chip is becoming larger and larger. When the functionality of the chip becomes so complicated that the number of input signals becomes large, the lookup tables adopted by the prior art will become so large that the aforesaid method will become impracticable.
In addition, the lookup table-based method fails to take into account that some conditional expressions are often used at the RTL code design stage of the logic chip, and these conditional expressions will cause the logic operation modes of the chip to vary with different combinations of the input signals. For example, the logic chip may perform a simple logic operation mode at one input signal combination, and a complicated multiplication operation mode at another input signal combination. Different operation modes indicate that the circuit will have different switching activities, and different switching activities will consume different amounts of power to result in the occurrence of varying power modes. Therefore, using such a method to estimate power consumption of a chip will result in substantial errors.
In sum, the conventional method ignores the diversity of operation modes of large logic chips, and the frequencies of occurrence of the operation modes may vary to a great extent with different input signals, so that the frequencies of the induced power mode are very different. In addition, when the operational clock frequency is increased, or when the control signals and data signals become more complicated, such a difference will become more obvious, thereby resulting in relatively large errors during power estimation.