1. Field of the Invention
The present invention relates to a behavioral synthesis system for generating a pipelined control data flow graph corresponding to behavioral descriptions including a description of a loop process. The present invention also relates to a behavioral synthesis method for automatically synthesizing a circuit at a Register Transfer Level (RTL) using the behavioral synthesis system. The present invention also relates to a control program for causing a computer to execute a procedure for automatically synthesizing a circuit, a computer readable recording medium storing the control program. These are used in designing, for example, a logic circuit. The present invention also relates to a method for producing a logic circuit using the behavioral synthesis system, and a logic circuit produced by the method.
2. Description of the Related Art
Conventionally, a behavioral synthesis process for producing an RTL logic circuit diagram based on behavioral descriptions is employed in designing large-scale circuits, such as system LSIs comprising a logic circuit.
The behavioral synthesis process is also called high-level synthesis process. In this process, RTL level hardware (circuit diagram) is automatically synthesized based on behavioral descriptions which contain only algorithms for data processing but not information on hardware structure.
“A C-based Synthesis System, Bach, and its Application”, Proceedings of the ASP-DAC 2001, 2001 (IEEE Catalog Number 01EX455, ISBN: 0-7803-6633-6) discloses a behavioral synthesis system, which is a computer system for synthesizing hardware (circuit diagram) by using an extended C language for hardware design as a behavioral description language.
“High Level Synthesis”, Kluwer Academic Publishers, 1992 (ISBN:0-7923-9194-2) reviews conventional behavioral synthesis techniques in detail. Conventional behavioral synthesis technology relates to a process for obtaining desired hardware (circuit diagram) from a control data flow graph (CDFG).
Hereinafter, a conventional behavioral synthesis apparatus and a conventional behavioral synthesis method (high-level synthesis method) for automatically synthesizing a desired circuit diagram based on behavioral descriptions, will be described.
The behavioral synthesis apparatus comprises a computer system. The behavioral synthesis apparatus comprises a CDFG generating section, a scheduling section, an allocation section, a data path generating section, and a controller generating section. The behavioral synthesis apparatus is used to successively execute steps in a behavioral synthesis process, thereby automatically synthesizing (designing) an RTL (register transfer level) hardware (circuit diagram).
FIG. 10 shows a behavioral synthesis process which is executed by the behavioral synthesis apparatus.
In stop S1, a control data flow graph (CDFG) is generated. The CDFG generating section analyzes a data flow indicated by an algorithm description (behavioral description) to produce a model called CDFG. The CDFG comprises at least one node which represents an operation, and at least one branch which represents data flow. An input branch and an output branch are connected to each node. An input branch represents data used in an operation, while an output branch represents data resulting from an operation. Each node also has information of the type of an operation.
FIG. 11 shows a behavioral description which is described in the C language.
FIG. 12 shows a CDFG corresponding to the behavioral description of FIG. 11.
The CDFG of FIG. 12 comprises a node 1 representing a multiplication, a node 2 representing a multiplication, and a node 3 representing an addition. At the node 1, input data a is multiplied by input data b. At the node 2, input data b is multiplied by input data c. At the node 3, the result of the multiplication at the node 1 is added with the result of the multiplication at the node 2. The result of the addition at the node 3 is output as output data x.
FIG. 13 shows a data structure corresponding to the CDFG of FIG. 12.
In the data structure of FIG. 13, a node is represented by a structure Node. The structure Node comprises a node_id representing a node-specific node number, an in_edge array, an out_edge array, and an op_type. The in_edge array has the branch number of at least one input branch. The out_edge array has the branch number of at least one output branch. The node of FIG. 13 represents an operation having 2 inputs and 1 output. The in_edge has two elements, while the out_edge has one element. The op_type has a number representing the type of each operation (addition, subtraction, multiplication, etc.).
In the data structure of FIG. 13, a branch is represented by a structure Edge. The structure Edge comprises an edge_id representing a branch identifying number, a from_node, and a to_node. The from_node and the to_node each have a node number representing a node connected to a branch having a prescribed branch number.
The data structure (data representing a connection between each node in CDFG) of FIG. 13 is stored in a memory of a computer (computing machine).
In order to connect a branch to a node or find a node inputted to or outputted from a prescribed node, a node number or a branch number is input to or referred from the memory of the computer.
Hereinafter, for the sake of simplicity, explanation will be performed with reference to FIG. 12 which visually represents a CDFG, in which a branch is connected to a prescribed node, and another node is connected to the input or output of the prescribed node.
Referring to FIG. 10 again, a procedure for behavioral synthesis will be described.
In step S2, the scheduling section performs a scheduling process for a CDFG generated.
The scheduling process is a process for determining when an operation represented by a node is executed. In the scheduling process, a node indicated in a CDFG is divided into several steps. One step is executed during one clock cycle.
In step S3, the allocation section performs an allocation process. An allocation process is also called a binding process, and comprises determining a register for storing data represented by a branch, and determining an operator for executing an operation represented by a node. In some behavioral synthesis methods (high-level synthesis methods), an allocation process may be executed before a scheduling process.
In step S4, the data path generating section generates a data path based on the results of scheduling and allocation.
In step S5, the controller generating section generates a controller based on the results of scheduling and allocation.
By executing steps S1 to step S5, RTL (register transfer level) hardware (circuit diagram) is automatically synthesized (designed). The details of step S1 to step S5 are described in “High Level Synthesis” (supra). An example of this technique is described in Japanese Laid-Open Publication No. 2001-229217.
Behavioral descriptions representing a process to be executed by an actually designed circuit typically contain a description indicating a loop process. A loop process is repeatedly executed a number of times. Therefore, the time required to process data is dominantly occupied by the time for a loop process.
Therefore, an effective way for speeding up a data process is to speed up a loop process. For example, image processing or speech processing needs to be completed within a prescribed time. It is essential to design a circuit capable of high-speed loop processing so as to meet a demand for high-speed data processing.
For example, a loop process is pipelined in order to realize a high-speed loop process.
“Percolation Based Synthesis”, Proceedings of Design Automation Conference 1990, pp. 444–448 (IEEE) discloses a method for pipelining a conventional loop process. This method is also illustrated in FIG. 3(e) of Japanese Laid-Open Publication No. 2001-142937.
Hereinafter, a method for pipelining a conventional loop process will be described.
FIG. 14 shows an exemplary behavioral description including a description representing a loop process.
The behavioral description of FIG. 14 represents an operation, in which functions f(i), g(i) and h(i) are executed while a condition is 10 is satisfied, where an integer variable i is incremented by one from i=0.
FIG. 15 shows a CDFG corresponding to the behavioral description of FIG. 14. A loop process represented by the CDFG is not yet pipelined.
The CDFG of FIG. 15 comprises a loop process portion 10. The loop process portion 10 comprises an increment operation node 14, a port 12 at an upper end of the loop process portion 10, and a port 13 at a lower and of the loop process portion 10. The port 12 and the port 13 each store the variable i of FIG. 14. The value (data) of the variable i stored in the port 12 is incremented by one by the increment operation node 14, and the resultant value is stored in the port 13. The data reaching the port 13 is fed back to the port 12 for the next loop. The data reaching the port 13 and the data fed back to the port 12 for the next loop indicate the same variable. A node 11 located outside the loop process portion 10 outputs a constant. The node 11 provides an initial value of the variable 1 to the port 12.
In the behavioral description of FIG. 14, only the variable i is repeatedly used in the loop process portion 10. If two or more variables are repeatedly used in the loop process portion, a number of pairs of ports (e.g., the port 12 and the port 13) corresponding to the number of the variables are provided.
The CDFG of FIG. 15 further comprises a terminating condition determining node 15, an “EXIT” node 16, and nodes 17 to 19.
The terminating condition determining node 15 determines whether or not a terminating condition is satisfied. If the terminating condition is “true” (i≦10), data “1” is output. If the terminating condition is “false”, data “0” is output.
The “EXIT” node 16 is a special node which controls the termination of the loop process portion 10. If the data output by the terminating condition determining node 15, which is input to the “EXIT” node 16, is “0” (the variable i is equal to or less than 10), the “EXIT” node 16 instructs a controller (not shown) to end the process of the loop process portion 10 and undergo transition to the next state.
The node 17 executes an operation relating to the function f. The node 18 executes an operation relating to the function g. The node 19 executes an operation relating to the function h. The value of the variable i is input to the node 17, the node 18 and the node 19. The node 17, the node 18, and the node 19 are scheduled to be executed in step 1, step 2, and step 3, respectively.
During a first clock cycle, the node 17 contained in step 1 of a first loop process calculates f(1). During a second clock cycle, the node 18 contained in step 2 of the first loop process calculates g(1). During a third clock cycle, the node 19 contained in step 3 of the first loop process calculates h(1).
During a fourth clock cycle, the node 17 contained in step 1 of a second loop process calculates f(2). During a fifth clock cycle, the node 18 contained in step 2 of the second loop process calculates g(2). During a sixth clock cycle, the node 19 contained in step 3 of the second loop process calculates h(2). Therefore, 30 cycles of processes are required to perform a loop process 10 times.
Note that no fixed format is used to represent a loop process with a CDFG. Although the format varies from document to document, the behavior is the same.
Next, an exemplary method for pipelining a loop process represented by the behavioral description of FIG. 14 will be described.
A loop process is pipelined using, for example, three stages. In other words, a single loop process is divided into three stages. Note that a stage means a group of processes for a pipeline operation.
The loop process in the behavioral description of FIG. 14 is scheduled using only three steps during one loop cycle unless the process is not pipelined. Therefore, when the loop process is pipelined, step 1, stop 2, and step 3 are assigned to stage 1, stage 2, and stage 3, respectively.
Alternatively, for example, an unpipelined loop process may be scheduled using 4 steps. When the 4 steps are divided into 2 stages for a pipelined process, steps 1 and 2 are assigned to stage land steps 3 and 4 are assigned to stage 2. In this case, 2 cycles are required to execute one stage.
In a pipelined circuit, processes as shown in FIG. 16 are performed.
Referring to FIG. 16, during a first clock cycle, only f(1) is executed. Next, during a second clock cycle, g(1) is executed, while a second loop process is started so that f(2) is executed. During a third clock cycle, h(1) is executed, while the first loop process is ended and at the same time g(2) in a second loop process and f(3) in a third process are simultaneously executed. Thereafter, during a kth clock cycle, h(k−2), g(k−1) and f(k) are simultaneously executed. Subsequently, during a 11th clock cycle, only h(9) and g (10) are executed. During a 12th clock cycle, only h(10) is executed. Thus, the loop process is ended.
As described above, 30 cycles are required for execution of the behavioral description of FIG. 14 when a loop process is not pipelined. When a loop process is pipelined, only 12 cycles are required.
FIG. 17 shows a CDFG when the loop process in the behavioral description of FIG. 14 is pipe lined with a conventional method. In the CDFG of FIG. 17, scheduling is also performed, so that nodes are divided into steps.
As shown in the CDFG of FIG. 17, only f(1) is executed in step 1, and g(1) and f(2) are simultaneously executed in step 2.
Next, the process goes to a loop process portion 21 contained in step 3. Note that in the loop process portion 21, one cycle is required to perform one process. The number of cycles requited depends on the number of repetitions of the loop process. Therefore, in step 4 and thereafter, the number of steps is different from the number of cycles.
In the loop process portion 21, stages 1 to 3 are arranged in parallel. A node 22 provides an initial value 3 to a loop variable i. Nodes 23 and 24 subtract 1 and 2, respectively, from an input value. Therefore, during a first loop process, nodes 25, 26 and 27 simultaneously execute f(3), g(2) and h(1), respectively. Next, during a second loop process, f(4), g(3) and h(2) are simultaneously executed. After third to seventh loop processes are executed, f(10), g(9) and h(8) are simultaneously executed during an eighth loop process. At a node 28, the condition (i≦10) is examined. If the condition is not satisfied, the process of the loop process portion 21 is ended. The process goes to the next step 4.
In step 4, g(10) and h(9) are executed. Next, in step 5, only h(10) is executed.
In the CDFG of FIG. 17, the process of a portion 20 executed before the loop process portion 21 is designated as a prologue portion, while a portion 29 after the loop process portion 21 is designated as an epilogue portion.
By producing such a CDFG, a loop process can be pipelined, thereby making it possible to speed up the process.
However, when a loop process is pipelined using a conventional method as shown in FIG. 17, a prologue portion and an epilogue portion are required before and after a loop process portion in a CDFG, respectively. As a result, the resultant CDFG is complicated. Therefore, when behavioral synthesis is performed from the CDFG to produce hardware (circuit diagram), the resultant hardware has a large area, resulting in an increase in manufacturing cost for a chip.
For the CDFG obtained by pipelining a loop process using such a conventional method, the prologue portion and the epilogue portion are inevitably executed without determining the loop repetition condition. Therefore, if the loop itself is ended before the execution of the prologue portion, a proper behavior is not obtained. For example, in the case of the behavioral description of FIG. 14, if the number of repetitions of a loop process is one, only f(1), g(1) and h(1) need to be executed. However, f(2) is executed during the second cycle. For example, if f(2) includes a process, such as memory writing or external communication, an improper value may be written into a memory or an unnecessary value may be externally sent out, i.e., a circuit may malfunction.
To avoid such a malfunction, it may be conceived to introduce a CDFG, in which the case where the number of repetitions of a loop process is small is, assigned to a separate portion of the CDFG. In this case, the resultant CDFG is complicated and hardware (circuit diagram) area is increased.