1. Field of the Invention
The present invention relates to a data driven type information processor and, more specifically, to a data driven type information processor which can execute data flow programs at high speed by improving capability of parallel execution of instructions.
2. Description of the Related Art
In a conventional yon-Neumann computer, various informations as a program are stored in a program memory in advance, and addresses in the program memory are sequentially specified by a program counter, so that the instructions are sequentially read out, whereby the instructions are executed.
On the other hand, a data driven type information processor is one type of non-von-Neumann computers, having no concept of sequential execution of instructions by a program counter. Such data driven type information processor employs architecture based on parallel processing of instructions. In the data driven type information processor wherein an execution of an instruction is enabled upon collection of data to be operated, a plurality of instructions are simultaneously driven by data, so that programs are executed in parallel in accordance with a natural flow of the data. As a result, it is considered that a time required for the operation is expected to be drastically reduced.
Referring to FIG. 5 showing a conventional data packet to be processed by the conventional data driven type image processor, a data packet Pa includes a destination field F1, an instruction field F2, a data 1 field F3 and a data 2 field F4. The destination field F1 stores a destination node number ND. The instruction field F2 stores an instruction code OP. The data 1 field F3 or the data 2 field F4 stores an operand data OPD. Referring to FIG. 1, the conventional data driven type image processor includes a program storing unit 50, a paired data detecting unit 56, an operation processing unit 60, an input unit 52, a junction unit 54 and a branch unit 58.
The junction unit 54 has two input ports I1 and I2 and two output ports O1 and O2, as shown in FIG. 2. Data packets output from output ports O1 and O2 are synchronized with each other.
Branch unit 58 includes two input ports il and i2 and two output ports o1 and o2.
Program storing unit 50 of FIG. 1 includes a memory access unit 102, a memory unit 100 and a packet generation unit 104, as shown in FIG. 3. The memory access unit 102 has its input end connected to output port o1 of branch unit 58. Packet generation unit 104 has its output end connected to input port I1 of junction unit 54. Memory unit 100 stores data flow program 170 shown in FIG. 4.
Memory access unit 102 reads a set of destination node number 112 (ND) and an instruction code 114 (OP) of the data flow program 110 stored in memory unit 100 as shown in FIG. 4, by address designation on the basis of the destination node number ND of a given data packet, and provides the destination node number 112 and the instruction code 114 to packet generation unit 104. Packet generation unit 104 stores the read destination node number 112 and the instruction code 114 in destination field F1 and instruction field F2 of the data packet Pa, respectively, and outputs the data packet. Program storing unit 50 is capable of outputting one instruction per one address designation.
Paired data detecting unit 56 matches input data packets Pa. More specifically, upon detection of two different data packets having the same destination node number ND, paired data detecting unit 56 stores operand data OPD of one of these data packets, for example the content of data 1 field F3 of FIG. 5, in data 2 field F4 of the other one of the data packets, and outputs said the other data packet Pa.
Operation processing unit 60 performs operation processing on one or two operand data OPD stored in the given data packet Pa in accordance with the instruction code OP stored therein, stores the result in data 1 field F3 of the data packet Pa and outputs this data packet.
FIGS. 6 (a) to (h) and 7 (a) to (e) show field structures of data packets flowing through the data driven type information processor shown in FIG. 1 during execution of a program.
Referring to FIG. 1, two input ports of input unit 52 are connected to data transmission paths 62 and 64. A data packet 134 including a destination field 130 and an instruction field 132 shown in FIG. 6 (a) is applied to data transmission path 62. A data packet 138 including such a data field as shown in FIG. 6 (b) is applied to data transmission path 64. Two output ports of input unit 52 are connected to input ports I1 and I2 of junction unit 54 through data transmission paths 68 and 70.
The output port of program storing unit 50 is connected to a data transmission path 82, which in turn is connected to input port I1 of junction unit 54. A data packet 144 including a destination field 140 and an instruction field 142 such as shown in FIG. 6 (c) is applied to data transmission path 82.
The output port of operation processing unit 60 is connected to a data transmission path 92, which in turn is connected to input port I2 of junction unit 54. A data packet 148 including a data field 146 such as shown in FIG. 6 (d) is applied to data transmission path 92.
Two output ports O1 and O2 of junction unit 54 are connected to two input ports of paired data detecting unit 56 through data transmission paths 72 and 74, respectively. A data packet 154 including a destination field 150 and an instruction field 152 such as shown in FIG. 6 (e) is applied to data transmission path 72. A data packet 158 including a data field 156 such as shown in FIG. 6 (f) is applied to data transmission path 74.
Two output ports of paired data detecting unit 56 are connected to two input ports il and i2 of branching unit 58 through data transmission paths 76 and 78. A data packet 164 including a destination field 160 and an instruction field 162 such as shown in FIG. 6 (g) is applied to data transmission path 76. A data packet including a data 1 field 166 and a data 2 field 168 such as shown in FIG. 6 (h) is applied to data transmission path 78. One output port o1 of branching unit 58 is connected to an input port of program storing unit 50 through a data transmission path 80 and to one input port of operation processing unit 60 through a data transmission path 86. Branching unit 58 is connected to the outside of the processor through a data transmission path 84. The other output port o2 of branching unit 58 is connected to the outside of the processor through a data transmission path 88 and to the other input port of operation processing unit 60 through a data transmission path 90. A data packet 182 including a destination field 180 such as shown in FIG. 7 (a) is applied to data transmission path 80, and a data packet 186 including an instruction field 184 such as shown in FIG. 7 (b) is applied to data transmission path 86. A data packet 192 including a data 1 field 188 and a data 2 field 190 such as shown in FIG. 7 (c) is applied to data transmission path 90. A data packet 198 including a destination field 194 and an instruction field 196 such as shown in FIG. 7 (d) is applied to data transmission path 84, and a data packet 202 including a data field 200 such as shown in FIG. 7 (e) is applied to data transmission path 88.
Referring to FIG. 8, the conventional data driven type information processor operates in the following manner. First, a set of data packets 134 and 138 are externally input to input unit 52. These data packets 134 and 138 are transmitted to input ports I1 and I2 of junction unit 54, respectively. Initially, the data packets 134 and 138 are transmitted as they are to paired data detecting unit 56 through output ports O1 and O2, as data packets 154 and 158, respectively. When two sets of different data packets having the same destination node number are detected at paired data detecting unit 56, a set of data packets 164 and 170 are output from paired data detecting unit 56. Branch unit 58 selects either continuation of internal processing related to these data packets 164 and 170 or external transmission of these data packets 164 and 170. If internal processing is to be continued, branch unit 58 separates data packet 164 into a data packet 182 including the destination field and a data packet 186 including an instruction field, transmits data packet 182 to program storing unit 50, and transmits data packet 186 to operation processing unit 60. Branch unit 58 transmits data packet 170 to operation processing unit 60 as data packet 192. If data packets 164 and 170 are to be output externally, data packet 164 is not separated. Data packet 164 to be externally output is output as a data packet 198, and similarly, data packet 170 is output as data packet 202.
Operation processing unit 60 performs operation processing related to one or two operand data OPD stored in data packet 192 in accordance with instruction code OP stored in data packet 186, and outputs a data packet 148 which stores only the data representing the result of operation.
In program storing unit 50, by address designation on the basis of destination node number ND stored in data packet 182, next destination node number and next instruction code in the data flow program 110 shown in FIG. 4 are read. A data packet 144 including the destination node number and the instruction code read from program storing unit 50 is output to data transmission path 82.
Thereafter, processing in accordance with data flow program 110 proceeds as each of the data packets circulates through the processing units in a certain order, in the same manner as described above.
Junction unit 54 arbitrates between externally applied data packets and internally processed data packets. When an internally processed data packet 144 contends with an externally applied data packet 134, the internally processed data packet 144 is given priority to be output from output port O1. The data packet, to which priority is not given, is kept waiting in an internal buffer, not shown, at junction unit 54 until there remains no contender. At output port O2, if internally processed data packet 144 is selected at output port O1, than data packet 148 output from operation processing unit 60 is selected. If the externally applied data packet 134 is selected at output port O1, an externally applied data packet 138 is selected. Data packet 158 is output in synchronization with data packet 154. The data packet not selected is kept waiting in the internal buffer.
The data packet 170 output from paired data detecting unit 56 is generated in the following manner. When instruction code OP of the corresponding data packet 164 indicates a one operand instruction in which only one operand data is required, operand data OPD is stored only in data 1 field 166. When the corresponding instruction code OP indicates a two operand instruction requiring two operand data, operand data OPD is stored in each of data 1 field 166 and data 2 field 168.
Therefore, it becomes possible to combine (merge) a new data packet read from program storing unit 50 with a corresponding data packet processed by operation processing unit 60 without adding any special identification information to the data packet separated by branch unit 58.
Again referring to FIG. 4, each line of data flow program 110 includes a destination node number 112, an instruction code 114 as well as a copy present/absent information 116 and a constant present/absent information 118. If the constant present/absent information indicates "present", it means that constant data 119 is stored in the next line. The copy present/absent information 116 and the constant present/absent information 118 are used in the following manner.
FIG. 9 shows an example of a data flow graph. Referring to FIG. 9, a node N1 indicates an addition instruction, node N2 a multiplication instruction, and node N3 a subtraction instruction. A node N4 indicates a decrement instruction while a node N5 indicates an increment instruction. Instructions at nodes N1, N2 and N3 are two operand instructions while instructions at nodes N4 and N5 are one operand instructions. The result of operation at node N1 is referred to by nodes N2 and N3. In this case, an output from node N1 must be applied to two nodes N2 and N3, and therefore a copy is provided at program storing unit 50.
Copying is done in the following manner. At first, according to the destination node number of the input data packet, content of the line at the designated address is read from data flow program 110. If copy present/absent information 116 indicates "absent" at this time, a data packet in which the contents of the destination field and the instruction field are updated is provided as an output and the process ends.
If copy present/absent information 116 indicates "present", the data packet in which the contents of the destination field and of the instruction field updated is output, and in addition, the destination node number 112, instruction code 114, copy present/absent information 116 and constant present/absent information 118 stored in the next line of data flow program 110 are read. If the newly read copy present/absent information 116 indicates "absent", the same data as the input data packet is stored in the data 1 field of the new data packet, the newly read destination node number and the instruction code are respectively stored in the destination field and the instruction field of the new data packet, and the new data packet is output. If the newly read copy present/absent information 116 indicates "present", the similar copying operation is further continued.
In the above described information processor, when packet data is processed in accordance with the data flow graph shown in FIG. 9, it requires only three steps. The operation processing unit 60 makes use of pipelining so as to enable such processing.
The operation processing unit employing pipelining is disclosed in an article entitled "Floating Point Processors", 1989. IEEE International Solid State Circuits Conference, DIGEST OF TECHNICAL PAPERS, pp. 46 and 47.
FIG. 10 is a block diagram showing an outline of the operation processing unit disclosed in the aforementioned article.
Referring to FIG. 10, the operation processing unit includes an input unit 210, an output unit 213, a multiplication unit 211, an accumulator 212, a pre-processing unit 214, an arithmetic logic unit 215 and a post-processing unit 216.
Input unit 210 has its input connected to an output of paired data detecting unit 56 through branch unit 58 (FIG. 1) and its output connected to multiplication unit 211 and to pre-processing unit 214. Output unit 213 has its output connected to an input of junction unit 54 (FIG. 1). Multiplication unit 211 has its output connected to an input of output unit 213 through accumulator 212. Pre-processing unit 214 has its output connected to arithmetic logic unit 215. Arithmetic logic unit 215 has its output connected to an input of output unit 213 through post-processing unit 216.
Namely, the operation processing unit includes two parallel transmission paths, that is, a data path passing through input unit 210, multiplication unit 211, accumulator 212 to output unit 213, and a data path passing through input unit 210, pre-processing unit 214, arithmetic logic unit 215, post-processing unit 216 to output unit 213.
In operation, input unit 210 and output unit 213 convert 2 words (operand data of data 1 field and data 2 field) included in the input packet data into 1 word for internal processing. In integer operation, multiplication unit 211 multiplies two data which have been converted to 1 word. Accumulator 212 is used to carry out "multiply and accumulate" operation included in the integer operation instruction. Pre-processing unit 214 recognizes floating point arithmetic instruction, realigns data for floating point arithmetic operation and applies the same to arithmetic logic unit 215. However, if it is not a floating point arithmetic instruction, pre-processing unit 214 applies the input data as it is to the arithmetic logic unit 215. Arithmetic logic unit 215 has a plurality of functions such as floating point arithmetic operation, logical operation, integer operation and executes various operations in accordance with instruction information included in the instruction field of the data packet. Post-processing unit 216 converts results of floating point arithmetic operation to normalized data. As for the results of other operations, post-processing unit 216 provides the applied data as it is to output unit 213.
The above described operation processing unit is capable of processing at most two operand data based on instruction information included in the packet data output from paired data detecting unit 56.
FIG. 10 shows general value of process time in each unit, normalized by using the process time in arithmetic logic unit 215 as a reference unit. The process time in multiplication unit 211 is "2", process time in accumulator 212 is "1", process time in pre-processing unit 214 is "0.5" and process time in post-processing unit 215 is "0.5".
Recently, improvement in processing rate of the data driven type information processor has been particularly desired. To meet such demand, the processing rate must be improved in each of program storing unit 50, paired data detecting unit 56 and operation processing unit 60.
Conventionally, program storing unit 50 reads one data per one access and provides the read data to paired detecting unit 56. Paired data detecting unit 56 performs one paired data detection for one input data, and provides the data storing the result of detection to program storing unit 50 as well as to operation processing unit 60. Since operation processing unit 60 carries out operation processing for one input data, once such operation is completed, the operation processing unit 60 is kept waiting for the next one data input to effect the next processing. Accordingly, the rate of operation processing by the operation processing unit 60 can be improved by reducing the wait time till the next processing.
In order to reduce the wait time, the number of paired data detection per unit time in paired data detection unit 56 should be increased so that the amount of data supplied from paired data detecting unit 56 to operation processing unit 60 per unit time is increased. In order to increase the amount of data to be supplied, the amount of read data per unit time in program storing unit 50 should be increased. However, the program storing unit 50 in the conventional processor reads only one data at one access. In order to read a large amount of data per unit time, increase in access speed of the program storing unit is the only way. However, when a memory allowing high speed accessing is employed in the program storing unit 50, the cost of the processor itself as well as the cost of the system including the processor is increased, which makes this approach impractical.
Further, if a high speed access memory is employed in the program storing unit 50, the improvement of processing rate of the information processor is defined by the speed of accessing of the program storing unit 50.
The operation processing unit of the conventional processor has the following problem related to the processing rate. In the operation processing unit disclosed in the aforementioned article shown in FIG. 10, all input data except integer multiplication pass through the pre-processing unit 214. Results of all operations except the result of multiplication pass post-processing unit 216. Therefore, the process time of "0.5" is necessary for data to pass through the pre-processing portion 214, and in addition, process time of "0.5" is necessary for data to pass through post-processing unit 216. However, not all data processed by the arithmetic logic unit 215 except the floating point data require realignment in the pre-processing unit 214 and conversion to the normalized data in the post-processing unit 216, and the data simply pass through the processing units 214 and 216. Thus the total processing time of "1" for passage therethrough is wasted.
Further, although all results of operations in the multiplication unit 211 are applied to accumulator 212, it is not necessary to accumulate all the results of multiplication. Therefore, the process time "1" for the passage through accumulator 212 may be wasteful.
In summary, the operation processing unit of the conventional processor allows for improvements of processing rate.