1. Field of the Invention
The present invention relates to a high level synthesis method for automatically generating a logic circuit of a register transfer level (RTL) from an operation description, a thread produced using the high level synthesis method, and a method for producing a circuit including the thread. The present invention is especially effective for designing of an ASIC (Application Specific Integrated Circuit) or other circuits which are required to be designed in a short period of time.
2. Description of the Related Art
A high level synthesis method automatically generates a logic circuit of an RTL which includes a hardware structure (registers, calculators and the like), data flow between registers per operation cycle, and processing, based on an operation description which describes only processing operations but does not include information on the hardware structure. This high level synthesis method is disclosed in, for example, Japanese Laid-Open Publication No. 6-101141. Hereinafter, a flow of such a conventional high level synthesis method will be described.
(i) Conversion of an operation description to a control data flowgraph (CDFG)
In high level synthesis, an operation description is first analyzed and converted to a model, referred to as a CDFG, which represents the dependency relationship among calculations, input to and output from an external device, and a memory access execution order.
FIG. 1 shows an exemplary operation description. The operation description shown in FIG. 1 is performed as follows. In a description 101, a memory content corresponding to address (adr) is substituted for a variable a. In a description 102, a memory content corresponding to address (adr+1) is substituted for a variable b. In a description 103, a memory content corresponding to address (adrxe2x88x921) is substituted for a variable c. In a description 104, the value of (a+b+c) is substituted for a variable d.
FIG. 2 shows an exemplary CDFG obtained by converting the operation description shown in FIG. 1. In the CDFG shown in FIG. 2, a node 105 represents an input to the circuit from an external device, and a node 106 represents an output from the circuit to an external device. Nodes 107 through 109 each represent a read request to the memory (memory read request or memory access request), and nodes 110 through 112 each represent data read from the memory. A node 133 represents an increment, and a node 134 represents a decrement. Nodes 135 and 136 each represent an addition.
Branches 113 and 114 each represented by a chain line in FIG. 2 are each a data dependency edge (or control dependency edge). The data dependency edge 113 connects the node 107 to the node 108. The data dependency edge 114 connects the node 108 to the node 109. A node, to which another node is connected to, needs to be scheduled to a step later than the step to which the another node is scheduled. For example, in a scheduling stage described below, the node 108 is scheduled to a step later than the step to which the node 107 is scheduled. When a pipeline-accessible memory is used, the memory read requests 107 through 109 are executed in the same order as in the operation description and are scheduled to different steps from each other. Herein, the term xe2x80x9cpipeline-accessible memoryxe2x80x9d is defined to refer to a memory which can request an access in each clock cycle. In the operation description shown in FIG. 1, memory read is performed three times. The data dependency edges 113 and 114 are provided so that these three memory read operations are performed in different steps in the order described.
Branches 117 through 119 each represented by a dashed line in FIG. 2 are each a data dependency edge. The data dependency edge 117 connects the node 107 to the node 110. The data dependency edge 118 connects the node 108 to the node 111. The data dependency edge 119 connects the node 109 to the node 112. A node, to which another node is connected to, needs to be scheduled to a step later by n steps than the step to which the another node is scheduled. Here, xe2x80x9cnxe2x80x9d represents a relative step number 120, 121 or 122 which is associated with each data dependency edge. For example, in the scheduling stage described below, the node 110 is scheduled to a step later by two steps than the step to which the node 107 is scheduled.
FIG. 3 is a timing diagram illustrating a read timing of a pipeline-accessible memory. When a memory which causes read data RDATA to be valid two clock cycles after a rise of a memory read request signal RREQ is used as shown in FIG. 3, the data dependency edges 117 through 119 each having a relative step number of 2 are provided. In this specification, a xe2x80x9cmemory which causes read data to be valid n clock cycles after a rise of a memory read request signal and also is pipeline-accessiblexe2x80x9d is referred to as a xe2x80x9cpipeline memory having a latency of nxe2x80x9d.
Branches 123 through 132 each represented by a solid line in FIG. 2 are each a data dependency edge. The data dependency edge 123 connects the node 107 representing a read request to the memory to the node 105 representing a memory address which is input from an external device. The data dependency edge 126 connects the node 108 representing a read request to the memory to the node 133 representing an increment. The data dependency edge 124 connects the node 133 representing an increment to the node 105 representing a memory address which is input from the external device. The data dependency edge 127 connects the node 109 representing a read request to the memory and the node 134 representing a decrement. The data dependency edge 125 connects the node 134 representing a decrement to the node 105 representing a memory address which is input from the external device. The data dependency edge 128 connects the node 110 representing data read from the memory to the node 135 representing an addition. The data dependency edge 129 connects the node 111 representing data read from the memory to the node 135 representing an addition. The data dependency edge 130 connects the node 135 representing an addition to the node 136 representing an addition. The data dependency edge 131 connects the node 112 representing data read from the memory to the node 136 representing an addition. The data dependency edge 132 connects the node 136 representing an addition to the node 106 representing an output to an external device. The processing result is output to the external device through the node 106.
(ii) Scheduling
In the scheduling stage, each node of the CDFG is assigned to a time slot, referred to a step, which corresponds to a state of a controller (finite state transfer machine).
FIG. 4 shows an exemplary result obtained by scheduling the CDFG shown in FIG. 2. In this example, each node is scheduled to one of five steps of steps 0 through 4. Calculations which are scheduled to different steps can share one calculator. For example, in FIG. 4, the node 135 representing an addition and the node 136 also representing an addition are scheduled to different steps from each other and thus can share one calculator. In the scheduling stage, the nodes are scheduled to the steps such that the number of hardware devices is as small as possible so as to reduce the cost.
(iii) Allocation
In an allocation stage, calculators, registers and input and output pins which are required to execute the scheduled CDFG are generated. Nodes representing calculations of the CDFG are allocated to the calculators. Data dependency edges crossing the borders between the steps are allocated to the registers. External inputs and outputs and memory accesses are allocated to the input and output pins.
FIG. 5 shows an exemplary manner of allocation. In this example, an incrementor 137, a decrementor 138 and an adder 139 are generated. As represented by a dashed line in FIG. 5, the node 133 representing an increment is allocated to the incrementor 137, the node 134 representing a decrement is allocated to the decrementor 138, and the nodes 135 and 136 representing an addition are allocated to the adder 139.
A register 140 is also generated. As represented by a dashed line in FIG. 5, the data dependency edges 124, 125, 128 and 130 crossing the borders between the steps are allocated to the register 140.
Input pins 141 and 142 and output pins 143 and 144 are generated. As represented by a dashed line in FIG. 5, the node 105 (input from an external device) is allocated to the input pin 141, and the node 106 (output to an external device) is allocated to the output pin 144. The nodes 107 through 109 (memory read requests) are allocated to the output pin 143, and the nodes 110 through 112 (read data) are allocated to the input pin 142.
(iv) Generation of a data path
In a data path generation stage, data paths corresponding to the data dependency edges of the CDFG are generated, and selectors are generated when necessary.
FIG. 6 shows an exemplary manner of data path generation. In the example shown in FIG. 6, as represented by a dashed line, paths 145 and 146 from the input pin 141 (to which the node 105 (external input) is allocated) to the output pin 143 (to which the node 107 (memory read request) is allocated) are generated in correspondence with the data dependency edge 123 from the node 105 (external input) to the node 107 (memory read request).
Also as represented by a dashed line in FIG. 6, paths 147 and 148 from the input pin 141 (to which the node 105 (external input) is allocated) to the register 140 (to which the data dependency edge 124 is allocated) and a path 149 from the register 140 (to which the data dependency edge 124 is allocated) to the incrementor 137 (to which the node 133 (increment) is allocated) are generated in correspondence with the data dependency edge 124 from the node 105 (external input) to the node 133 (increment).
As represented by a dashed line in FIG. 6, paths 150 and 148 from the input pin 142 (to which the node 110 (read data) is allocated) to the register 140 (to which the data dependency edge 128 is allocated) and a path 151 from the register 140 (to which the data dependency edge 128 is allocated) to the adder 139 (to which the node 135 (addition) is allocated) are generated in correspondence with the data dependency edge 128 from the node 110 (read data) to the node 135 (addition).
In a similar manner, the following data paths are generated: a data path corresponding to the data dependency edge from the node 105 (external input) to the node 134 (decrement), a data path corresponding to the data dependency edge from the node 133 (increment) to the node 108 (memory read request), a data path corresponding to a data dependency edge from the node 134 (decrement) to the node 109 (memory read request), a data path corresponding to a data dependency edge from the node 111 (read data) to the node 135 (addition), a data path corresponding to a data dependency edge from the node 112 (read data) to the node 136 (addition), and a data path corresponding to a data dependency edge from the node 136 (addition) to the node 106 (external output).
In the case where the calculators, registers and output pins and the like are shared as in this example, selectors 152 and 153 are generated for selecting data which is to be input to the calculators, registers, output pins and the like.
(v) Generation of a control logic
In a control logic generation stage, a control logic for controlling the registers, selectors and the like generated in the allocation stage and the data path generation stage is generated.
FIG. 7 shows exemplary manner of control logic generation.
(1) Generation of input and output pins for a control logic
As input and output pins for a control logic, an input pin 154 for receiving a clock signal and an output pin 155 for outputting a memory read request are generated. When the memory read request is output, a memory address is output from the output 143 and the memory read request is output from the output pin 155.
(2) Generation of a finite state transfer machine
Next, a finite state transfer machine is generated as follows. The same number of states 403 through 407 (S0 through S4) as the total number of the steps in the scheduling result are generated. Then, state transfer logics 408 through 411 are generated such that the state is transferred sequentially from SO in each clock cycle, under the condition that a state corresponding to one step is transferred to a state corresponding to the next step when the finite state transfer machine is active. After that, state output logics and state output pins 158 through 162 are generated. Each of the state output pins 158 through 162 becomes active when having a value corresponding to its respective state, and becomes inactive when having a value corresponding to any other state.
(3) Generation of a memory read request signal logic
As can be appreciated from the above-described scheduling result, a memory read is requested in state S0 corresponding to step 0, state S1 corresponding to step 1, and state S2 corresponding to step 2. Accordingly, a logic 163 is generated such that when one of states S0, S1 and S2 respectively output from the state output pins 158, 159 and 160 of the finite state transfer machine 156 is active, the output from the output pin 155 is active.
(4) Generation of a selector selection logic
As can be appreciated from the above-described scheduling, allocation and data path generation results, in step 0, the paths 145 and 146 from the input pin 141 to the output pin 143 are used. Therefore, a logic 165 is generated such that when state SO which is output from the state output pin 158 of the finite state transfer machine 156 is active, an input 164, of the selector 153, connected to the input pin 141 via the path 145 is selected.
The logic 165 may alternatively be generated such that when states S1 and S2 which are respectively output from the state output pins 159 and 160 of the finite state transfer machine 156 are active, an input connected to the incrementor and an output connected to the decrementor are respectively selected. Similarly regarding the selector 152, a logic is generated such that when states S0, S2 and S3 which are respectively output from the state output pins 158, 160 and 161 of the finite state transfer machine 156 are active, an input connected to the input pin 141 via the path 145, an input connected to the input pin 142 via the path 150, and an input connected to the adder 139 are respectively selected. Regarding a logic (OR) connected to the register 140, a logic is generated such that when states SO through S3 are active, the register 140 becomes active.
In the above-described manner, the logic circuit of an RTL is generated from an operation description.
A circuit configuration, in which a plurality of threads operating in parallel share a memory, will be described. Herein, a xe2x80x9cthreadxe2x80x9d refers to a xe2x80x9ccircuit having an independent finite state transfer machinexe2x80x9d. In the case where each thread is independently generated by the above-described conventional high level synthesis method in such a circuit configuration, access competition occurs when the plurality of threads access the shared, common memory simultaneously. As a result, correct memory access is not performed.
According to one aspect of the invention, a high level synthesis method for generating a logic circuit of a register transfer level from an operation description is provided. The method includes a control data flowgraph generation stage of analyzing the operation description describing a processing operation and not including information regarding a hardware structure, and generating a control data flowgraph which represents a dependency relationship among calculations, input to and output from an external device, and a memory access execution order; a scheduling stage of scheduling each of a plurality of nodes of the control data flowgraph to a step corresponding to a state of a controller; an allocation stage of generating a calculator, a register, and input and output pins which are required to execute the scheduled control data flowgraph, and allocating a calculation of the control data flowgraph to the calculator, allocating a data dependency edge crossing a border between steps to the register, and allocating an external input, an external output and a memory access to the input and output pins; a data path generation stage of generating a data path corresponding to the data dependency edge of the control data flowgraph and generating a selector when necessary; and a control logic generation stage of generating a control logic for controlling the register and the selector generated in the allocation stage and the data path generation stage. When generating a thread sharing a common memory with another thread operating in parallel therewith, a memory access request is represented by a node of the control data flowgraph so as to perform scheduling, and a control logic is generated, which outputs a memory access request signal to a common memory interface in a state corresponding to a step to which the node is scheduled, and which keeps the state until a memory access request acceptance signal from the common memory interface is changed to be active.
In one embodiment of the invention, the high level synthesis method further includes the stage of generating a read data storage and selection circuit, which includes a read data storage circuit for temporarily storing data which is read from the common memory and a read timing generation circuit for generating a timing for reading the data from the common memory.
In one embodiment of the invention, the read data storage and selection circuit includes a continuous transfer determination circuit for determining whether or not a finite state transfer machine included in the thread has been continuously transferred, and a read data selection circuit for selecting whether data which is read from the common memory is to be used or data which is stored in the read data storage circuit is to be used.
In one embodiment of the invention, the read data storage circuit includes a queue.
In one embodiment of the invention, the read timing generation circuit includes a shift register.
In one embodiment of the invention, the continuous transfer determination circuit includes a shift register.
According to another aspect of the invention, a thread generated using the above-described high level synthesis method is provided.
According to still another aspect of the invention, a method for generating a circuit including a plurality of threads which are generated using a high level synthesis method according to claim 1 and a common memory interface connected to each of the plurality of threads is provided. The method includes the stages of generating the common memory interface for, when the plurality of threads send a read request signal, accepting a read request signal from a thread having a highest priority, among the plurality of threads having an active read request signal, and changing a request acceptance signal to the selected thread to be active; generating the plurality of threads; and connecting the plurality of threads with the common memory interface.
According to the present invention, when a plurality of threads operating in parallel access a common memory simultaneously, mediation can be performed such that access competition among the plurality of threads does not occur.
In an embodiment in which the read data storage and selection circuit including the read data storage circuit and the read timing generation circuit are generated, data which is read from the memory with the correct timing is once stored in the read data storage circuit. Therefore, the thread can read data stored in the read data storage circuit whenever necessary.
In an embodiment in which the read data storage circuit includes the queue and an embodiment in which the read timing generation circuit includes the shift register, the area of the circuit is reduced.
In an embodiment in which the read data storage and selection circuit includes the continuous transfer determination circuit and the read data selection circuit, it is determined whether or not the finite state transfer machine included in the thread has been continuously transferred. Based on the determination result, either one of the data read from the memory or the data stored in the read data storage circuit is selected. Therefore, the memory access can be performed with a latency which is equal to the latency of the memory.
In an embodiment in which the continuous transfer determination circuit includes the shift register, the area of the circuit is reduced.
Thus, the invention described herein makes possible the advantages of providing a high level synthesis method for preventing access competition among a plurality of threads when the plurality of threads operating in parallel simultaneously access a common memory, a thread generated using the high level synthesis method, and a circuit including a plurality of threads generated using the high level synthesis method.
These and other advantages of the present invention will become apparent to those skilled in the art upon reading and understanding the following detailed description with reference to the accompanying figures.