Conventionally, in designing a large-scale circuit (e.g., system LSI), a behavioral synthesis which generates a description of RTL (Register Transfer Level) is performed from a behavioral description of a circuit. This behavioral synthesis is also called high-level synthesis.
In this behavioral synthesis, a circuit diagram for hardware is automatically synthesized from a behavioral description, which only describes an algorithm for processing but does not include information regarding the structure of the hardware.
For example, according to a behavioral synthesis system described in Reference 1, it is possible to synthesize a circuit diagram for hardware using the language, in which C language is extended for hardware design, as a behavioral description language.
Hereinafter, a procedure of this behavioral synthesis will be briefly described.
In this behavioral synthesis, first, the flow of data in the algorithm described with the behavioral description language is analyzed, and a model called data flow graph is created.
In a digital circuit, by performing various computations on a plurality of data, the processing intended by the digital circuit is performed. The graph which represents the computations and the flow of data in this case is the data flow graph.
This data flow graph is structured with a plurality of nodes and branches which connect the nodes. A node represents one computation performed in the digital circuit. A branch represents the flow of data from one computation to another computation. By appropriately connecting branches with nodes representing computations, it is possible to represent the behavior of the digital circuit as the data flow graph.
Each node in the data flow graph is connected by an input branch and an output branch. The input branch represents data to be given for computation. The output branch represents data obtained as a result of the computation. In addition, each node includes information regarding the type of a computation and the like.
For example, the behavioral description described with C language shown in FIG. 13 can be represented by the data flow graph shown in FIG. 14.
FIG. 14 includes two nodes 101 and 102 representing multiplication and one node 103 representing addition. FIG. 14 shows adding the result obtained by multiplying inputs a and b and the result obtained by multiplying inputs b and c and then outputting the result of adding to x.
On a computer, the data flow graph in FIG. 14 is represented by, for example, a data structure shown in FIG. 15.
In FIG. 15, the node is represented by Node structure (struct Node) and includes a node number node_id specific to each node. in_edge and out_edge stores, the branch number of input branches to each node and the branch number of output branches to each node, respectively. In the example in FIG. 15, the node represents a computation with two inputs and one output; hence, in_edge has two elements and out_edge has one element. In op_type, numbers representing the types of computations such as addition, subtraction, multiplication and the like are stored.
The branch is represented by Edge structure (struct Edge), and includes a branch number edge_id specific to each branch. from_node and to_node of a branch stores the node numbers of the nodes connected by that branch.
With these data structures, the connection between each branch in the data flow graph is stored in a memory (database) of a computer.
In the data flow graph, when nodes are connected to each other or when it is intended to find another node which is connected to an input/output of one node, branch numbers and node numbers are registered to the elements in the database described above and the elements in the database are made reference to. Hereinafter, in order to make a description understood easily, when nodes are connected by branches or when it is intended to find nodes which are connected to a node which proceeds and follows, the description will be made with reference to a diagram visually showing the data flow graph, as shown in FIG. 14.
Next, a scheduling process and an allocation process are performed on the data flow graph. The scheduling is a process for determining when each node in the data flow graph is executed. The allocation is called binding and includes a process for determining a register for storing data represented by a branch in the data flow graph and a process for determining which computing unit is used in order to perform a computation represented by a node in the data flow graph. Depending on a behavioral synthesis method, the allocation is performed prior to the scheduling.
Next, based on the result of the scheduling and the result of the allocation, a data path and a controller are generated, and hardware is obtained. An example of the processing in order to obtain a circuit from the data flow graph is, for example, disclosed in Reference 2.
In the behavioral synthesis, it is important to use one computing unit a plurality of times in order to perform a plurality of computation processings. Hence, the number of computing units in a chip is reduced, thus resulting in the reduction of the area of the chip. Thus, it is possible to reduce a cost for manufacturing a chip.
Accordingly, in the behavioral synthesis, it is important to contrive such that as many computations as possible share a computing unit.
When the same type of the computation is executed at different steps, then it is possible to process the computations with one computing unit. Thus, in the scheduling process, it is necessary to contrive so as to obtain a scheduling in which the same type of computation is performed at different steps. Such a scheduling method is, for example, disclosed in Reference 3.
In the allocation process, when there is a plurality of methods of sharing a computing unit, it is necessary to select a sharing method such that the area of a circuit becomes as small as possible. Such an allocation method is, for example, disclosed in Reference 4.
As described above, the method of sharing the computing unit per the same type of computation is effective in reducing the area of the circuit. However, when a combination of a series of computations appears a plurality of time during the entire process, in order to further reduce the size of the area, it is effective to create a circuit for a sub-graph (partial graph) representing the combination of the series of computations in the data flow graph and to share the entire circuit.
The reason for this is because a selector is only required for an input of the entire shared circuit and no selector is required for individual computations in the circuit.
For example, in an example in Portion (a) of FIG. 16, sub-graphs 111 and 112 including two additions and one multiplication are the same. Thus, when the scheduling is performed such that these are executed at different clock cycles, then it is possible to share one circuit 113, as shown in Figure Portion (b) of FIG. 16. This circuit 113 is a circuit for executing the sub-graphs 111 and 112 by repeating the processing.
In the first clock step, the result obtained by an adder 114 is selected by a selector 115 and then input to the circuit 113. The result computed by the circuit 113 is stored in a register 116.
In the second clock step, an output of the register 116 is selected by the selector 115 and then input to the circuit 113. The result computed by the circuit 113 is output to the outside.
In Portion (b) of FIG. 16, control signals for the selector 115 and the register 116 are omitted.
When the circuit is not shared, it is necessary to have another circuit equivalent to the circuit 113. Thus, when the area of the circuit 113 is larger than that of the selector 115 and the register 116 which are required when the circuit is shared, it is possible to reduce the size of the area by sharing the circuit.
As described above, in order to share the circuit, it is necessary to search a plurality of same sub-graphs from the data flow graph.
However, in a problem of searching the same partial graphs from the data flow graph, it is commonly known that the number of search processings increases exponentially with respect to the increase of the number of nodes in the data flow graph. Thus, when the data flow becomes large, the time required for processing becomes extremely long, which is not practical. As such, it is necessary to contrive in order to easily search the same sub-graphs from the data flow graph.
For example, Reference 5 discloses a method of searching similar sub-graphs from a data flow graph. Here, the “similar” sub-graphs indicate sub-graphs which become the same sub-graphs by inserting computations resulting in a no-value change (e.g., addition of “0” and multiplication of “1”). In other words, the graph searching method described in Reference 5 determines whether graphs are similar to each other or not when the same sub-graphs are searched, instead of determining whether the graphs are the same to each other or not, which is essentially the same as the search for the same sub-graphs.
In this similar sub-graph searching method, in order to find similar sub-graphs at a high speed, a contrivance described below is carried out. In other words, as shown in Portion (a) of FIG. 17 and Portion (b) of FIG. 17, the similar sub-graph searching method is limited to n number of stages with no branching in a sub-graph.
When it is assumed that the number of inputs for each computation in this data flow graph is “2”, then, as shown in FIG. 18, the three stages of the sub-graphs having node F as an output are four combinations of A-D-F, B-D-F, B-E-F and C-E-F. Generally, there is a computation which has only one input, and a computation having three or more inputs is rare. Therefore, sub-graphs are often less than four combinations. Thus, when it is assumed that the total number of nodes in the data flow graph is N, the number of sub-graphs in the data flow graph is at most 4N, which is possible to compute by a computer. Hence, it is possible to compare these sub-graphs and find similar sub-graphs.
Generally, the number of sub-graphs for n number of stages is N×2(n-1). In a large-scale circuit, the number of nodes N in a data flow graph is about 10,000. When n becomes smaller, the number of nodes N can be handled by a computer.
[Reference 1] Japanese Laid-Open Publication No. 2001-229217
[Reference 2] Japanese Laid-Open Publication No. 2000-348069
[Reference 3] Japanese Laid-Open Publication No. 2003-76728
[Reference 4] “AC-based Synthesis System, Bach, and its Application” Proceedings of the ASP-DAC 2001, 2001 (IEEE Catalog Number: 01EX455, ISBN: 0-7803-6633-6)
[Reference 5] “Improved Interconnect Sharing by Identity Operation Insertion” Proceedings of the ICCAD 1999, 1999 IEEE (ISBN: 0-7803-5832)