1. Field of the Invention
The present invention relates generally to data driven information processors executing a data flow graph, and more particularly to a data driven information processor capable of readily debugging a data flow graph to be executed.
2. Description of the Related Art
In a data driven processor, processings proceed in parallel based on the simple rule "if all necessary data for a processing is collected and resources for operation necessary for the processing are allocated, the processing is executed". Data is transmitted by a data packet together with its destination information.
FIG. 1 is a block diagram showing a conventional data driven information processing system for video signal processing. Referring to FIG. 1, the conventional system includes a data driven processor 61, and an image memory portion 11. Image memory portion 11 includes a memory interface 2 and an image memory 3.
Data driven processor 61 includes input ports IA, IB and IV, and output ports OA, OB and OV. Input ports IA and IB are connected with transmission paths 7 and 8, respectively, and provided with a data packet including a video signal to be processed. Input port IV is connected with a transmission path 5, and provided with a data packet including a result of accessing image memory 3 executed in image memory portion 11. Output ports OA and OB are connected with transmission paths 9 and 10, respectively, and provided with a data packet including data based on a result of processing executed in the system. Output portion OV is connected with a transmission path 4, and a data packet including data for accessing image memory portion 11 is output to memory interface 2.
Memory interface 2 and image memory 3 are connected with each other through a memory access control line 6. Data transmitted on memory access control line 6 is also a data packet.
FIG. 2 shows an arrangement of the field of a data packet transmitted through data transmission paths 7-10 within the system shown in FIG. 1 by way of illustration. A data packet 120 input/output to/from processor 61 includes an instruction code 122 indicating the content of processing in the processor, a processor number 124 for uniquely specifying a data driven processor to process the data packet in the system, a node number 126 for uniquely specifying an instruction to be executed on the processor to process the data packet, a generation number 128, i.e., an identifier attached based on the order of input time series, and data 130. Generation number 128 is used in queuing for data in processor 61, and has a meaning of an address to image memory 3 for memory interface 2.
Note that as shown in FIG. 2, in this example, instruction code 122, processor number 124, node number 126, generation number 128 and data 130 have bit lengths of 8 bit, 9 bit, 6 bit, 24 bit, and 12 bit, respectively but the bit length of each field and the packet length of the entire data packet may take other values.
The system shown in FIG. 1 operates as follows in data processing. A signal input packet having a generation number is provided in time series to processor 61 through input port IA or IB. Processor 61 stores a data flow program for video processing. Processor 61 processes provided data based on the program and outputs the result of processing through one of output ports OA and OB.
If image memory 3 should be accessed (for referring to/updating of data stored in image memory 3) a data packet storing a request of accessing image memory 3 is output to image memory 11 through the output port OB of processor 61. Memory interface 2, upon receiving the request of accessing accesses image memory 3 through memory access control line 6, and provides a data packet storing the resultant data to the input port IV of processor 61 through transmission path 5. Processor 61 receives the data packet through input port IV and continues the processing based on the above-described data flow program.
FIG. 3 is a block diagram showing a conventional data driven processor 61 for use in a data driven system for video processing. Referring to FIG. 3, the conventional data driven processor 61 includes an input processing portion 17 having its input terminal connected with input ports IA and IB, a junction portion 12, a main body processing portion 13 storing a data flow program and executing a processing based on the program, a branch portion 14, an output processing portion 15 having its output terminal connected with output ports OA and OB, a PE# register 16 for storing an identification number PE# for uniquely identifying processor 61 of interest within the system in a network, and a branch control parameter register group 18 storing branch conditions for determining whether to output a data packet to output port OA or OB.
Input processing portion 17 receives a data packet input through input port IA or IB, compares a processor number in the input packet and the content of PE# register 16 and selectively provides the data packet to junction portion 12 or output processing portion 15 based on the result of comparison. More specifically, if the processor number in the input data packet and the content of PE# register 16 are in coincidence, input processing portion 17 determines that the data packet is directed to its processor, and provides the packet to junction portion 12. If no coincidence is found, input processing portion 17 determines that the packet is directed to another processor, and provides the packet to output processing portion 15.
Junction portion 12 joins the data packet provided from input processing portion 17 with a data packet provided from branch portion 14 which will be described later, and provides the resultant packet to main body processing portion 13. Main body processing portion 13 processes the data packet provided from junction portion 12 based on a prescribed data flow program. At the time, if image memory 3 (shown in FIG. 1) should be accessed, main body processing portion 13 outputs the packet to image memory portion 11 through output port OV. A data packet resulting from the processing in image memory portion 11 is received by data driven processor 61 through input port IV.
Branch portion 14 receives the data packet output from main body processing portion 13 and selectively provides the data packet to output processing portion 15 or junction portion 12 based on whether or not the processor number in the data packet and the content of PE# register 16 are in coincidence. More specifically, if the processor number in the input data packet coincides with the content of PE# register 16, branch portion 14 provides the data packet to junction portion 12. If no coincidence is found, branch portion 14 provides the input data packet to output processing portion 15.
Output processing portion 15 receives the data packet provided from branch portion 14 or input processing portion 17, and selectively outputs the data packet to one of output portions OA and OB based on a branch condition set for branch control parameter register group 18 by referring to the processor number or the generation number in the input data packet.
Such a data driven processor having the arrangement as described above and shown in FIG. 3 is for example disclosed by Japanese Patent Laying-Open No. 6-162228. In the data driven processor in the disclosure, three kinds of registers including an ID parameter register (PE), a branch comparison data parameter register (RD), and a branch comparison mask parameter register (RM) are prepared as branch control parameter register group 18. Among them, ID parameter register (PE) is considered to be equivalent to PE# register 16 shown in FIG. 3, and therefore the device shown in FIG. 3 would be equivalent to the data flow processor shown in Japanese Patent Laying-Open No. 6-162228. As a result, as branch control parameter register groups 18, provision of two kinds of registers, the branch comparison data parameter register and the branch comparison mask parameter register should be sufficient. According to Japanese Patent Laying-Open No. 6-162228, the branch condition may be expressed as follows. EQU (RM. and. pe#).exor. (RM. and. RD) (1)
wherein RM and RD are values stored in branch comparison mask parameter register RM and branch comparison data parameter register RD, respectively, pe# is a processor number in the input packet to output processing portion 15, and operators, and, exor represent a bit-based logical multiplication and a bit-based exclusive OR.
If the operands (RM. and. pe#) and (RM. and.RD) in the exor operation are in coincidence, the value of expression (1) becomes zero. In this case, output processing portion 15 outputs the input data packet onto transmission path 9 through output port DA. If the operands ((RM. and. pe#) and (RM. and. RD) in the exor operation are not in coincidence, the result of expression (1) will not be zero. In this case, output processing portion 15 outputs the input data packet onto transmission path 10 through output port OB.
FIG. 4 shows the functional configuration of branch portion 14 in the conventional data driven processor 61. Referring to FIG. 4, branch portion 14 includes a branch destination determination portion 142 for comparing a processor number in a data packet provided from main body processing portion 13 and the content of PE# register 16 and outputting the result of determination, and a selector 141 controlled by the output of branch destination determination portion 142 for selectively outputting the data packet provided from main body processing portion 13 to junction portion 12 through one terminal A or to output processing portion 15 through the other terminal B.
Branch destination determination portion 142 applies "1" to selector 141 if the processor number in the data packet and the content of PE# register 16 are in coincidence and "0" if there is no coincidence. Selector 141 provides the data packet to junction portion 12 if the output of branch destination determination portion 142 is "1", and to output processing portion 15 for "0".
FIG. 5 shows a more specific arrangement of conventional branch portion 14 together with portions associated with junction portion 12 and output processing portion 15. Referring to FIG. 5, branch portion 14 includes data transmission paths 72, 74 and 76 constituting a pipeline through which a data packet propagates and data latch circuits 54 and 56. The input side of data transmission path 72 is connected with a data latch circuit (not shown) in main body processing portion 13. Data transmission path 76 branches to be connected with the input of data latch circuit 60 in junction portion 12 and the input of data latch circuit 58 in output processing portion 15.
Branch portion 14 further includes transfer control elements 44 and 46 for controlling timings for latching data by data latch circuits 54 and 56. Similarly, junction portion 12 includes a transfer control element 50, and output processing portion 15 includes a transfer control element 48. Transfer control elements 44, 46, 48 and 50 generate clock pulses for controlling timings for latching data by data latch circuits 54, 56, 58 and 60, respectively, and provides the generated pulses to their corresponding data latch circuits. Each transfer control element 44, 46, 48 or 50 has an input for data hold signal CI, an output for data hold signal CO, an input for an empty signal RI from a succeeding stage, and an output for an empty signal RO for a preceding stage. Transfer control elements 44, 46, 48 and 50 each control propagation of data on each data latch circuit by exchanging data hold signals CO and CI, and empty signal RO and RI with transfer control elements in preceding and succeeding stages.
Branch portion 14 further includes a coincidence determination circuit 42 for comparing a processor number in a data packet provided from main body processing portion 13 through data latch circuit 54 and the content of PE# register 16 and provides a signal indicating "1" if coincidence is found and "0" if no coincidence is found, an inverter 90 having its input connected to the CO output of transfer control element 46, an NAND circuit 82 having one input connected with the output of inverter 90 and the other input connected to the output of coincidence determination circuit 42 through data latch circuit 56, an inverter 92 having its input connected with the output of coincidence determination circuit 42 through data latch circuit 56, an NAND circuit 86 having two inputs connected to inverters 90 and 92, and an AND circuit 88 having two inputs connected to the outputs RO of transfer control elements 50 and 48. The output of NAND circuit 82 is connected with the input CI of transfer control element 50. The output of NAND circuit 86 is connected with the input CI of transfer control element 48.
The operation of branch portion 14 shown in FIG. 5 will briefly be described. Assume that a processor number in a data packet from main body processing portion 13 coincides with the content of PE# register 16. The output of coincidence determination circuit 42 then attains a high level. Assume that signals CO and RO output from transfer control elements 44, 46, 50 and 48 are all at a high level. In this case, empty signal RI to transfer control element 46 is at a high level. If empty signal RI from a succeeding stage is at a high level, the succeeding stage may receive data from a preceding stage. On the contrary, if empty signal RI from the succeeding stage is at a low level, the succeeding stage is not prepared to receive data.
If data hold signal CI from a preceding stage to transfer control element 46 falls to a low level, transfer control element 46 pulls empty signal RO to be applied to the preceding stage to a low level. In response to empty signal RI applied from transfer control element 46 attaining the low level, transfer control element 44 in the preceding stage pulls data hold signal CO to a high level. In response to data hold signal CI attaining the high level, transfer control element 46 raises clock pulse CP to data latch 56, and pulls empty signal RO to a low level. Data latch circuit 56 latches the outputs of data latch circuit 54 and coincidence determination circuit 42 in response to the rising of the clock pulse.
Transfer control element 46 pulls data hold signal CO to a low level in response to the rising of clock pulse CP. The output of inverter 90 is pulled from the low level to a high level. As described above, since the output of coincidence determination circuit 42 is at a high level, the output of NAND circuit 82 is pulled to a low level from the high level. Transfer control element 50 pulls empty signal RO from the high level to a low level in response to data hold signal CI attaining the low level.
Since the output of inverter 92 is at a low level, the output of NAND circuit 86 is always at a high level irrespectively of the output of inverter 90. Transfer control element 48 does not operate and its empty signal RO remains at the high level.
The output of AND circuit 88 attains a low level in response to the empty signal RO of transfer control element 50 attaining the low level. Transfer control element 46 pulls clock pulse CP to a low level in response to empty signal RI attaining the low level, and pulls data hold signal CO to a high level from the low level. Transfer control element 50, in response, raises a clock pulse to data latch circuit 60 in order to have the data latched, pulls empty signal RO once again to a high level, and therefore the empty signal RI of transfer control element 46 rises to a high level. At the time, transfer control element 48 does not operate. More specifically, in this case, data latched by data latch circuit 56 in branch portion 14 is latched only by data latch circuit 60 at junction portion 14, and is not latched by data latch circuit 58 in output processing portion 15.
If the output of coincidence determination circuit 42 indicates that there is no coincidence, in other words the value "0", the operation of each circuit within branch portion 14 is reversed from the above, the data of data latch circuit 56 is latched only by data latch circuit 58 in output processing portion 15, and is not latched by data latch circuit 60 in junction portion 12.
Based on a result of determination by coincidence determination circuit 42, branch portion 14 permits junction portion 12 and output processing portion 15 to selectively output data, and the data may be branched as a result.
FIG. 6 shows an example of an arrangement of a system using four data driven processors 61 for video processing. Referring to FIG. 6, the four processors 61 of the system are allocated with identification numbers PE#0, PE#1, PE#2 and P#3, respectively to uniquely identify the respective processors. The numbers 0 to 3 are stored in the PE registers 16 of processors 61 (see FIG. 3). Now, a description follows by specifying processors with the identification numbers allocated to respective processors 61.
In the system shown in FIG. 6, a network is formed so that a data packet may be provided to an arbitrary processor from another processor. Assume that a data packet is provided from processor PE#0 to processor PE#1. A data packet having processor number 124 shown in FIG. 2 set to the identification number of target processor PE#1 is output. The data packet is once provided to the input port IA of processor PE#3. The data packet is then output from the output port OA of processor PE#3 and provided to the input port IA of target processor PE#1.
In order to form such a network, the RM and RD of the branch control parameter register group for each processor are set as shown in FIG. 6. Determination of selection of an output portion in each processor is based on expression (1) given above.
In the example shown in FIG. 6, an output port in each processor is selected as follows.
In processor PE#0, if the least significant bit of processor number 124 in the output data packet is 1, output port OA is selected, and output port OB is selected otherwise. In processor PE#1, if the least significant bit of processor number 124 in data packet is 0, output port OA is selected, and output port OB is selected otherwise. In processors PE#2 and PE#3, if processor number 124 in the data packet is between 0 and 3, output port OA is selected and output port OB is selected otherwise. FIG. 7 shows a simple flow graph as an example to be executed by a data driven processor. The data flow graph shown in FIG. 7 is a program for finding a solution to y=(x+2) (x-1) for input data x.
Referring to FIG. 7, the data flow graph includes an input node 102, a copy node 104 duplicating data provided from input node 102 and branching it into two, a "+" node 106 for adding the data provided from copy node 104 with "2", a "-" node 108 subtracting 1 from data provided from copy node 104, a "*" node 110 for multiplying data provided from "+" node 106 and "-" node 108, and an output node 112 for receiving data provided from node 110. Based on the data flow graph, y=(x+2) (x-1) results at output node 112.
An example of an arrangement of a system executing the data flow graph is shown in FIG. 8. The arrangement shown in FIG. 8 uses one data driven processor 61 for video processing. Processor 61 is allocated with identification number PE#0. A description follows on the operation of processor PE#0 when the simple data flow graph shown in FIG. 7 is allocated to the processor.
A data packet storing an input value x is provided to the input port IA of processor PE#0. The processor number 124 (FIG. 2) of the data packet is the identification number of processor PE#0. An instruction code 122 shown in FIG. 2 is a copy instruction to copy node 104.
Referring to FIG. 3, the data packet is provided to input processing portion 17. Input processing portion 17 determines that the data packet is destined to its processor and provides the data packet to main body processing portion 13 through junction portion 12. In main body processing portion 13, a processing corresponding to "copy" node is executed by the input data packet, and the input data packet and its duplicate are output as a result. The node numbers of these packets indicate the "+" operation and "-" operation shown in FIG. 7, respectively. The processor numbers 124 of these data packets (see FIG. 2) both indicate processor PE#0.
The two data packets after the duplication are output from main body processing portion 13 and provided to branch portion 14. Branch portion 14 provides these data packets to junction portion 12, because both input data packets have processor numbers 124 (see FIG. 2) in coincidence with the content of PE# register 16. The data packets are once again provided to main body processing portion 13 and the "+" and "-" operations are executed.
The two data packets after the operations have both node numbers 126 (see FIG. 2) set to "*" node, and output from main body processing portion 13. Processor numbers 124 (see FIG. 2) are stored with a value indicating processor PE#0. The two data packets resulting from execution of "+" and "-" operations are both provided to branch portion 14 from main body processing portion 13.
Branch portion 14 provides these data packets to junction portion 12, because the input two data packets have processor numbers 124 in coincidence with its processor number. These data packets through junction portions 12 are once again provided to main body processing portion 13, matched as inputs at the left and right of "*" node and "*" operation is executed.
Main body processing portion 13 (see FIG. 3) is assume to be programmed so that a resultant data packet after execution of the "*" operation is provided with a processor number such as "PE#1". The data packet is provided from main body processing portion 13 to branch portion 14. Branch portion 14 provides the data packet to output processing portion 15, because the processor number 124 of the input data packet is not in coincidence with its processor number. Output processing portion 15 determines an output port based on the content set in the branch control parameter register group as described above. If the registers are set as shown in FIG. 8, in other words, if processor number 124 in the data packet is set to "PE#1", the data packet is output from output port OA.
If the data flow graph shown in FIG. 7 as described above is executed by the data driven processor shown in FIG. 8, output results are sometimes not correct. This may be because of mistakes in describing the data flow graph. In such a case, the data flow graph should be debugged.
As a conventional method of debugging the data flow graph, the data flow graph is revised to have an operation result under being executed be output, and thus obtained operation result under execution is checked if it is as expected up to the point. FIG. 9 shows a revised version of the data flow graph shown in FIG. 7 for debugging.
Referring to FIG. 9, in the revised example, results of "+" and "-" operations are both output externally from processor PE#0. More specifically, as shown in FIG. 9, the data flow graph is revised and two output nodes 114 and 116 for debugging are newly provided. The output of "+" node 106 and the output of "-" node 108 are output separately from "*" node 110 to these outputs for debugging 114 and 116.
Thus, as shown in FIG. 10, data packets from output nodes 114 and 116 are output to output port OV, and the contents of the data packets may be checked. The respective results output externally from the processor are compared with their expected values, in order to check which part of the data flow graph has an error.
Since the example shown in FIG. 7 is a very simple data flow graph, it is not difficult to revise for debugging. Data flow graphs for use in practical purposes are far more complicated, and it is usually difficult to know the intermediate result of which part should be output to facilitate debugging. A data flow graph is often revised a number of times without success in order to obtain an intermediate result facilitating debugging. The efficiency of debugging such a data flow graph has been low in the conventional system, and construction of the system requires a long period of time.