1. Field of the Invention
The present invention relates to a method for designing arithmetic device allocation in which arithmetic operations in a data flow graph are allocated to arithmetic devices based on a scheduling result when performing high-level synthesis for automatically synthesizing a digital circuit from behavioral descriptions of an LSI circuit.
2. Description of the Related Art
Conventionally, high-level synthesis technologies are known as useful technologies for designing an LSI (Large Scale Integration) circuit in a short period of time. The high-level synthesis technologies are technologies for automatically synthesizing a circuit from behavioral descriptions which do not include information about a hardware structure and only includes a processing algorithm. For example, “High-Level Synthesis”, Kluwer Academic Publishers, is one publication including detailed description of conventional high-level synthesis technologies.
A brief description is given below with respect to a process of automatically synthesizing a circuit from behavioral descriptions using a conventional high-level synthesis technology. The high-level synthesis is performed according to a procedure shown in FIG. 1.
Firstly, at step 1, a flow of data in behavioral descriptions is analyzed so as to create a model referred to as a “data flow graph” (DFG). The DFG is a graph similar to a flowchart of a program and includes nodes and branches. The branches and nodes respectively represent data and arithmetic operations. An input of an arithmetic operation corresponds to an input branch of a node and an output of an arithmetic operation corresponds to an output branch of a node.
For example, a behavioral description shown in FIG. 2 is represented by a DFG shown in FIG. 3. The DFG of FIG. 3 includes nodes 31 and 32 representing two multiplications and a node 33 representing an addition, and represents that a result of multiplication of inputs a and b and a result of multiplication of inputs b and c are added together so as to output an arithmetic operation result x.
Next, at step 2 of FIG. 1, scheduling is performed so as to determine when to perform arithmetic operations corresponding to the nodes in the DFG, i.e., it is determined which arithmetic operation is performed in which clock cycle. In this case, it is necessary to include all the nodes in the DFG in a clock period in consideration of delay times of all the arithmetic operations.
Examples of scheduling of the DFG shown in FIG. 3 are shown in FIGS. 4 and 6. In FIG. 4, the DFG is scheduled such that two multiplications and an addition are performed in a single clock cycle (cycle 1). For example, in the case where delay times of an adder and a multiplier are respectively 5 nanoseconds (ns) and 60 ns, when a clock period is equal to or more than 65 ns, the DFG can be scheduled as shown in FIG. 4.
In FIG. 6, the DFG is scheduled such that a single multiplication 61 is performed in cycle 1 and the rest of the arithmetic operations, i.e., a multiplication 62 and an addition 65, are performed in cycle 2. The scheduling shown in FIG. 6 is also possible when the clock period is equal to or more than 65 ns. In FIG. 6, data represented by a branch 63 crossing a border of a clock cycle is stored in a register R1 and data represented by a branch 64 crossing the border of a clock cycle is stored in a register (not shown) which preserves constant b.
The scheduling result shown in FIG. 4 can be realized by the circuit shown in FIG. 5. The circuit of FIG. 5 includes two multipliers and an adder. Inputs a and b are input to one of the two multipliers so as to be multiplied together, and inputs b and c are input to the other one of the two multipliers so as to be multiplied together. The results of the multiplications performed by both of the two multipliers are input to the adder so as to be added together, so that an operation result x is output.
On the other hand, the scheduling result shown in FIG. 6 can be realized by the circuit shown in FIG. 7. The circuit of FIG. 7 includes a selector (sel) 71, a multiplier 75, an adder 74, a register 72 and a controller 73.
The selector 71 outputs a left-side input, i.e., input a, when a select signal 76 indicated by a dotted arrow corresponds to 1, and outputs a right-side input, i.e., input c when the select signal 76 corresponds to 0. The register 72 stores a value of an input at the instant of the rise of a clock when an enable signal 77 indicated by another dotted arrow corresponds to 1, and retains a value stored therein when the enable signal 77 corresponds to 0. Then, the register 72 outputs the stored (or retained) value. The controller 73 generates the signals 76 and 77 to respectively control the selector 71 and the register 72.
The operation of the circuit of FIG. 7 is now described. In cycle 1 (FIG. 6), both the selector signal 76 and the enable signal 77 correspond to 1, and therefore the register 72 stores a value of multiplication (a×b). In cycle 2 (FIG. 6), both the selector signal 76 and the enable signal 77 correspond to 0, and therefore multiplication (c×b) is calculated so that the adder 74 receives the result of the multiplication (c×b), and the value of the multiplication (a×b) stored in the register 72 is output to the adder 74 so as to be added to the result of the multiplication (c×b). The adder 74 outputs the additional result x.
The circuit of FIG. 5 completes an operation thereof in one cycle, but two multipliers are required. On the other hand, the circuit of FIG. 7 requires two cycles for completing an operation thereof, but only one multiplier is required. In the high-level synthesis technologies, it is possible to synthesize the circuit of FIG. 5 when a high-speed circuit is required and the circuit of FIG. 7 when a circuit having a small area is required.
Next, at step 3 of FIG. 1, register allocation is performed. In the scheduling result shown in FIG. 6, it is necessary to store in a register data represented by a branch crossing the border of the clock cycle, such as the branches 63 and 64. The reason for this is that in a synchronous circuit, a value of a register is changed for each clock cycle. For example, in order to use the calculation result (a×b) of the multiplication 61 for the addition 65, the calculation result (a×b) is required to be temporarily stored in the register at the border of the clock cycle. In the register allocation, a register is allocated to such a branch crossing a border of each clock cycle. In the following description, a register allocation result refers to register(s) each represented by a rectangle including a name of the register, e.g., reference numeral 66 in FIG. 6 denotes a register having a name R1. A register for preserving a value of variable b can be used as a register for the branch 64, but such a register for the branch 64 is not shown in FIG. 7.
Next, at step 4 of FIG. 1, arithmetic device allocation for allocating arithmetic operations in a DFG to arithmetic devices based on scheduling and register results is performed. In the scheduling result shown in FIG. 6, two multiplications 61 and 62 can be performed while sharing a single multiplier 75 shown in FIG. 7. However, in the case where there are a plurality of methods for sharing an arithmetic device between arithmetic operations, a procedure for determining an optimum sharing method is required. This procedure is referred to as the “arithmetic device allocation”.
In conventional arithmetic device allocation design methods (for example, see “High-Level Synthesis”, Kluwer Academic Publishers, Japanese Laid-Open Patent Publication No. 2000-242669, etc.), as in the case of allocating the two multiplications 61 and 62 of FIG. 6 to the single multiplier 75 of FIG. 7, reduction in circuit area is achieved by allocating arithmetic operations to arithmetic devices such that the number of arithmetic devices to be used becomes as small as possible.
Next, at step 5 of FIG. 1, a circuit at an RTL (Register Transfer Level), which includes hardware structures, such as registers, arithmetic devices, etc. and processes data flowing between the registers for each operation cycle, is created by creating data paths based on branches in the DFG and a control logic for controlling the registers, selectors, etc.
In the conventional arithmetic device allocation design methods, the arithmetic operations are allocated to the arithmetic devices such that the number of arithmetic devices to be used becomes as small as possible, and therefore when performing the arithmetic device allocation based on the scheduling and register allocation results shown in FIG. 6, the arithmetic device allocation result shown in FIG. 7 is obtained.
However, as shown in FIG. 8, arithmetic device allocation for separately allocating the two multiplications 61 and 62 of FIG. 6 to different multipliers 101 and 102 is also possible. The circuit of FIG. 8 includes: a multiplier 101 for multiplying inputs a and b together; a multiplier 102 for multiplying inputs b and c together; a register (R1) 103 for storing and retaining a multiplication result of the multiplier 101; an adder for adding an output of the register 103 and a multiplication result of the multiplier 102 together so as to output an operation result x; and a controller for generating a signal to control the register 103.
The circuit of FIG. 8 requires two multipliers, but a selector, which is required in the arithmetic device allocation result shown in FIG. 7, is not required. Therefore, in the case where an area of a multiplier is smaller than that of a selector, the arithmetic device allocation result shown in FIG. 8 allows the entire circuit area to be small as compared to the arithmetic device allocation result shown in FIG. 7. On the other hand, in the case where the area of the selector is smaller than that of the multiplier, the arithmetic device allocation result shown in FIG. 7 allows the entire circuit area to be small as compared to the arithmetic device allocation result shown in FIG. 8.
In general, a multiplier has an area larger than that of a selector, and therefore in many cases, the arithmetic device allocation result shown in FIG. 7 is preferable. However, in the case of an arithmetic operation for which an arithmetic device having an area smaller than that of the selector is used, as in the case of FIG. 8, an arithmetic device allocation result in which an arithmetic device is not shared between arithmetic operations allows the entire circuit area to be small as compared to the arithmetic device allocation result shown in FIG. 7.
Now, a case where arithmetic device allocation is performed based on the scheduling and register allocation results shown in FIG. 9 is examined. In FIG. 9, inputs a and b are added together (addition 111) in cycle 1, inputs c and d are added together (addition 112) in the next cycle 2, and the results of the additions 111 and 112 are multiplied together. In FIG. 9, data (the addition result of inputs a and b) represented by the branch crossing the border of the clock cycle is stored in the register R1.
In this case, arithmetic device allocations shown in FIGS. 10 and 11 are possible. The circuit shown in FIG. 10 includes: a selector for selecting either inputs a or c; a selector for selecting either inputs b or d; an adder for adding the selection results of both selectors together; a register for storing and retaining the addition result of the adder; a multiplier for multiplying an output of the register and the addition result of the adder together so as to output a multiplication result x; and a controller for generating signals to respectively control the register and the selectors.
The circuit shown in FIG. 11 includes: an adder for adding inputs a and b together; a register for storing and retaining the addition result (a+b); an adder for adding inputs c and d together; a multiplier for multiplying the addition result (c+d) and an output of the register together so as to output a multiplication result x; and a controller for generating a signal to control the register.
The circuit of FIG. 10 realizes the additions 111 and 112 using a single adder, but two selectors are required. On the other hand, the circuit of FIG. 11 requires realizes the additions 111 and 112 using two adders, but no selector is required. Therefore, when the total area of two selectors is smaller than the area of an adder, the arithmetic device allocation result shown in FIG. 10 in which the additions 111 and 112 are realized by the single adder allows the entire circuit area to be small as compared to the arithmetic device allocation result shown in FIG. 11. On the contrary, when the total area of two selectors is larger than the area of an adder, the arithmetic device allocation result shown in FIG. 11 in which the additions 111 and 112 are realized by the two adders allows the entire circuit area to be small as compared to the arithmetic device allocation result shown in FIG. 10.
In general, an area of an adder is about twice as large as that of a selector having the same bit width as that of the adder, and therefore by using a selector so as to share an adder between additions, rather than separately allocating the additions to adders, the entire circuit area is made small. On the other hand, in the case where an arithmetic device (an adder, a multiplier, or the like) is smaller than a selector, by separately allocating arithmetic operations to arithmetic devices, the entire circuit area is naturally made small. However, there are some cases where the entire circuit area is made small by using a separate adder for each addition so as to reduce the number of selectors, rather than sharing an adder between additions.
However, in the conventional arithmetic device allocation design method for minimizing the number of arithmetic devices, as shown in FIGS. 7 and 10, the number of arithmetic devices is minimum, but more selectors are required, and therefore the entire circuit area may be increased rather than being reduced.
Next, a case where the arithmetic device allocation is performed based on the scheduling result shown in FIG. 12 is examined. In FIG. 12, inputs a and b are added together (addition 141) in cycle 1, and the addition result and input c are multiplied together (multiplication 142), so that an arithmetic operation result x is output. In the next cycle 2, inputs c and d are multiplied together (multiplication 143), and the multiplication result and input b are added together (addition 144), so that an arithmetic operation result y is output.
In the case of using the conventional arithmetic device allocation design method for minimizing the number of arithmetic devices, the circuit shown in FIG. 13 is obtained. The circuit of FIG. 13 includes selectors 154 and 155, an adder 152, a multiplier 153 and a controller 151. The additions 141 and 144 are allocated to a single adder 152, and the multiplications 142 and 143 are allocated to a single multiplier 153.
The operation of the circuit shown in FIG. 13 is now described. In cycle 1, control signals 157 and 158 output by the controller 151 respectively correspond to 1 and 0. Therefore, the selector 154 selects input a so as to be input to the adder 152. The adder 152 calculates an addition (a+b) so as to output the addition result to the selector 155. The selector 155 selects the result of the addition (a+b) so as to be input to the multiplier 153. The multiplier 153 calculates a multiplication ((a+b)×c) so as to output a multiplication result x. In cycle 2, the control signals 157 and 158 output by the controller 151 respectively correspond to 0 and 1. Therefore, the selector 155 selects input d so as to be input to the multiplier 153. The multiplier 153 calculates a multiplication (d×c) so as to output the multiplication result to the selector 154. The selector 154 selects the result of the multiplication (d×c) so as to be input to the adder 152. The adder 152 calculates an addition ((d×c)+b) so as to output an addition result y. In this manner, the circuit of FIG. 13 can obtain the operation of the arithmetic operation results x and y shown in FIG. 12 using a single adder and a single multiplier.
However, in the circuit of FIG. 13, as indicated by the bold line, a loop 156 including only a combination circuit is created. The combination circuit refers to a logic circuit in which a logic output thereof is determined by each logic input, and examples of the combination circuit include inverters, NAND circuits, NOR circuits or the like, and a combination thereof. A sequential circuit, such as a flip flop or a latch circuit, is not included in the combination circuit. Ideally, the control signals 157 and 158 output by the controller 151 do not correspond to 0 simultaneously, and therefore the selectors 154 and 155 do not select a left-side input simultaneously, whereby the loop 156 is unlikely to be activated. However, in an actual circuit, timings at which the control signals 157 and 158 output by the controller 151 vary can differ from each other, and therefore the loop 156 is activated only for a short period of time.
In the loop 156 including only the combination circuit as described above, data returns to the same arithmetic devices, and therefore there is a possibility that oscillation might be caused based on a principle similar to a ring oscillator. Once the oscillation is caused, power consumption is increased and moreover, circuit operation becomes unstable, thereby causing circuit malfunction. Furthermore, the presence of such a loop prevents correct evaluation of delays in the steps of logic synthesis, floor planning (layout by units of blocks), routing of layout (layout by units of a gate in a block), etc. Therefore, the circuit created by the arithmetic device allocation design method for minimizing the number of arithmetic devices is unreliable.
Next, a case where the arithmetic device allocation is performed based on the scheduling result shown in FIG. 14 is examined. In cycle 1 of FIG. 14, inputs a and b are multiplied together, and the multiplication result and input c are added together (addition 161) so that an arithmetic operation result x is output. In the next cycle 2, inputs c and d are added together (addition 162) and the addition result is divided by input e so that an arithmetic operation result y is output.
Here, the delay for an operation of an adder is 5 ns, the delay for an operation of each of a multiplier and a divider is 60 ns, the delay for an operation of a selector is 1 ns, and a clock period is 100 ns. In cycle 1 of FIG. 14, the multiplication and the addition are successively performed, and therefore a period of 65 ns is required. However, the clock period is 100 ns, and therefore there is a sufficient amount of time to complete these arithmetic operations. Also, in cycle 2, there is a sufficient amount of time to complete arithmetic operations.
In the case where the arithmetic device allocation is performed based on the scheduling result shown in FIG. 14 using the conventional arithmetic device allocation method for minimizing the number of arithmetic devices, the circuit shown in FIG. 15 can be obtained. The circuit of FIG. 15 includes: a multiplier 171; a selector 174; an adder 172; a divider 173; and a controller 175, and two additions 161 and 162 (FIG. 14) are allocated to a single adder 172.
The operation of the circuit shown in FIG. 15 is now described. In cycle 1, a control signal 177 output by the controller 175 corresponds to 0. Therefore, the selector 174 selects an output (a×b) of the multiplier 171 so as to be input to the adder 172. The adder 172 adds the output (a×b) and input c together so as to output an arithmetic operation result x. In this case, if each of the delays in the paths from inputs a, b and c to the arithmetic operation result x is equal to or less than the clock period of 100 ns, the arithmetic operation result x is correctly output. The path from input a to the arithmetic operation result x extends through the multiplier 171, the selector 174 and the adder 172, and therefore the delay in that path is 66 ns. Further, the respective delays in the paths from inputs b and c to x are 66 ns and 5 ns, and therefore each of the delays in all the paths is equal to or less than the clock period. In cycle 2, the control signal 177 output by the controller 175 corresponds to 1. Therefore, the adder 172 receives input d and calculates an addition (d+c) so as to output the addition result to the divider 173. The divider 173 calculates a division (d+c)/e so as to output an arithmetic operation result y. In this case, if each of the delays in the paths from inputs c, d and e to the arithmetic operation result y is equal to or less than the clock period of 100 ns, the arithmetic operation result y is correctly output. The path from input d to the arithmetic operation result y extends through the selector 174, the adder 172 and the divider 173, and therefore the delay in that path is 66 ns. Further, the respective delays in the paths from inputs c and e to the arithmetic operation result y are 65 ns and 60 ns, and therefore each of the delays in all the paths is equal to or less than the clock period.
In the circuit of FIG. 15, there is a path 176 from input a to the arithmetic operation result y indicated by the bold line. The path 176 extends through the multiplier 171, the selector 174, the adder 172 and the divider 173, and therefore the sum of the delays is 126 ns. If data flows through the path 176, a delay in the entire circuit becomes greater than the clock period of 100 ns, and therefore circuit malfunction is caused. However, when the control signal 177 output by the controller 175 corresponds to 0, the selector 174 selects data so as to flow through the path from input a to the arithmetic operation result x, and when the control signal 177 corresponds to 1, the selector 174 selects data so as to flow through the path from input d to the arithmetic operation result y, and therefore data does not flow through the path from input a to the arithmetic operation result y. Thus, even if the sum of the delays along the path 176 from input a to the arithmetic operation result y exceeds the clock period, the circuit functions normally, and therefore there is no need to consider the path 176 from input a to the arithmetic operation result y. A path such as the path 176 is referred to as a “false path”.
However, it is not possible to distinguish whether a path is a false path or a path through which data actually flows using an automatic logic synthesis tool, a floor planning tool, a layout routing tool, or the like which are presently available. Therefore, a pseudo-timing error is caused because of the generation of a false path longer than a clock period, or an arithmetic device on the false path is replaced by an arithmetic device, which is fast but has a large area, at the time of the optimization in a logic synthesis step, so that a circuit area is increased for no reason. If any tool capable of recognizing the false path is realized in the future, such problems would not be caused, but these problems are frequently caused at present.
As a method for detecting a timing error due to a false path as described above, there is a method disclosed in Japanese Laid-Open Patent Publication No. 2000-203555, for example. However, this method detects a false path by performing a simulation so as to confirm that a timing error does not happen in reality, and therefore there is a problem that the detection of the false path requires a very long period of time. Further, when there is any omission in simulation patterns, there is a possibility that a path through which a signal is actually transferred is mistakenly recognized as being a false path.
Furthermore, in order to omit a loop including only a combination circuit, a long false path, or the like, after the arithmetic device allocation is performed without considering the generation of the loop including only a combination circuit, the long false path, and the like, it is necessary to add an arithmetic device to a circuit such that a single arithmetic device is not shared between a plurality of arithmetic devices, thereby increasing a circuit area.