In circuit design, a designer may start with a behavioural description, which contains an algorithmic specification of the functionality of the circuit. High-level synthesis converts the behavioural description of a very large scale integrated (VLSI) circuit into a structural, register-transfer level (RTL) implementation. The RTL implementation describes an interconnection of macro blocks (e.g., functional units, registers, multiplexers, buses, memory blocks, etc.) and random logic.
A behavioural description of a sequential circuit may contain almost no information about the cycle-by-cycle behaviour of the circuit or its structural implementation. High-level synthesis (HLS) tools typically compile a behavioural description into a suitable intermediate format, such as Control-Data Flow Graph (CDFG). Vertices in the CDFG represent various operations of the behavioural description. Data and control edges are used to represent data dependencies between operations and the flow of control.
High-level synthesis tools typically perform one or more of the following tasks: transformation, module selection, clock selection, scheduling, resource allocation and assignment (also called resource sharing or hardware sharing). Scheduling determines the cycle-by-cycle behaviour of the design by assigning each operation to one or more clock cycles or control steps. Allocation decides the number of hardware resources of each type that will be used to implement the behavioural description. Assignment refers to the binding of each variable (and the corresponding operation) to one of the allocated registers (and the corresponding functional units).
In VLSI circuits, the dynamic components that are incurred whenever signals in a circuit undergo logic transition, often dominate power dissipation. However, not all parts of the circuit need to function during each clock cycle. As such, several low power design techniques have been proposed based on suppressing or eliminating unnecessary signal transitions. In general, the term used to refer to such techniques is power management. In the context of data path allocation, power management can be applied to data path allocation using the following techniques:
i) Operand Isolation
Inserting transparent latches at the inputs of an embedded combinational logic block, and additional control circuitry to detect idle conditions for the logic block. The outputs of the control circuitry are used appropriately to disable the latches at the inputs of the logic block from changing values. Thus, the previous cycles input values are retained at the inputs of the logic block under consideration, eliminating unnecessary power dissipation.
The operand isolation technique has two disadvantages. The signals that detect idle conditions for various sub-circuits typically arrive late (for example, due to the presence of nested conditionals within each controller state, the idle conditions may depend on outputs of comparators from the data path). Therefore, the timing constraints that must be imposed (i.e. the enable signal to the transparent latches must settle before its data inputs can change) are often not met, thus making the suppression ineffective. Further, the insertion of transparent latches in front of functional units can lead to additional delays in a circuit's critical path and this may not be acceptable in signal and image-processing applications that need to be fast as well as power efficient.
ii) Constrained Register Sharing Technique for Low Power VLSI Design
Spurious operations are avoided by constraining register sharing in data path allocation, according to US patent publication no. U.S. Pat. No. 6,195,786 B1, issued on 27 Feb. 2001 to Raghunathan et al. In this scheme, judicious sharing of the registers is allocated so that the output variables to some operations sharing a common register do not result in unnecessary switching to the inputs of other functional units.
The data path allocation utilises perfect power management, at least in so far as it relates to spurious switching activities, which requires the input to every functional unit to be connected to registers that are not connected to the input of any other functional unit. This rigid requirement results in an architecture with power savings and area overheads that vary in proportion to the amount of sharable resources and the nature of the CDFG. In constrained register sharing, where the sequence of values appearing at one input of a functional unit in any iteration corresponds to v1_1, v2_1, . . . , vi_1 and the sequence of values appearing at the other input of the functional unit in any iteration corresponds to v1_2, v2_2, . . . , vi_2, then the values at the inputs of the functional unit in any clock cycle corresponds to some pair (vj_1, vj_2), 1≦j≦i.
FIG. 1 is an illustration of non-constrained use of two registers. FIG. 2 is an illustration of constrained sharing of a register. In FIG. 1, a first register R1 provides a first variable Var A to an arithmetic logic unit (ALU) during a first clock period and a second variable Var B to a multiplier (MULT), which also receives an input from elsewhere, during a second clock period after the first clock period. With constrained sharing of a register, as exemplified in FIG. 2, a second register R2 is additionally provided. The first register R1 provides the first variable Var A to the arithmetic logic unit (ALU) during the first clock period and the second register R2 provides the second variable Var B to the multiplier (MULT) during the second clock period. Spurious switching activity is avoided by constraining sharing. However, this leads to an increased number of Registers.
Constrained register sharing has been reported as providing power savings of 8.4% to 33.0% and area overheads spanning from −4.0% to 6.4%. A related paper was published by the inventors named in the above-mentioned US patent publication no. U.S. Pat. No. 6,195,786 B1, G. Lakshminarayana, A. Raghunanthan, N. K. Jha, and S. Dey, “Power management in high level synthesis,” IEEE Trans. VLSI Systems, vol. 7, no. 1, pp. 7-15, March 1999. In this paper, a reduction in power (mean 23.05, standard deviation 13.3%) at an area overhead of 0% to 13.8% is reported with a bandpass example tested under throughput constraints of 59, 72 and 86 cycles. It was noted that the power savings and area overheads increased with an increase in sampling periods because increased sampling periods facilitated increased register sharing, which led to an increase in the execution of spurious operations. In order to eliminate spurious operations, register sharing has to be inhibited to a greater extent compared with tightly performance-constrained designs.
Constrained register sharing management, though resulting in good power savings, has area overheads that are dependent on the Data Flow Graph properties. For a Data Flow Graph (DFG) with many sharable operations and variables, the area overhead incurred in this allocation scheme is large. Large area overheads are incurred with constrained register sharing management for DFG, where many output variables of operations extend to more than one operation in the subsequent control steps, i.e. operations have inputs connected to other operations. This is especially so as the chances of spurious switching activities arising from sharing of registers for such variables are higher. The rigid conditions of the constrained register sharing do not provide a developer with the flexibility to generate RTL designs that use registers less than is required at the data path allocation phase.
The cost of a die is generally proportional to four times its area. It is therefore cost efficient to produce RTL designs of minimal area to lower production costs. However, with the emergence of portable or mobile computing and communication devices such as laptops, palmtop computers, mobile telephones, wireless modems, handheld video games, etc., power consumption has become another major consideration in all RTL designs. Excessive power dissipation in integrated circuits not only discourages their use in a portable environment, but also causes overheating, which degrades performance and reduces chip lifetime. To control their temperature levels, high power chips require specialised and costly packaging and heat sink arrangements. Excessive power consumption is a limiting factor in integrating more transistors on a single chip or on a multiple-chip module.
The requirements of area and power of a chip design differ in scale according to the applications the chip serves to cater. For portable devices and high-density microelectronic devices, the power dissipation of VLSI circuits is a critical concern. For non-portable devices and less-dense microelectronic devices, power dissipation may be of less significant consideration compared to that of the die area.