1. Field of the Invention
The present invention relates generally to data processors and particularly to those having a function performing an operation calculating a plurality of sums for a result of an arithmetic operation.
2. Description of the Background Art
By the Applicant, techniques used in a data driven information processor to perform a product-sum operation (an operation to calculate the sum of a plurality of products) are disclosed in Japanese Patent Laying-Open Nos. 6-60206 and 8-329038. The disclosures in the publications are based on to hereinafter describe a product-sum operation performed in a data driven information processor.
A data driven information processor processes data in an order in which the data become executable. As such, a program can be executed regardless of the order of entry of data and data can thus be processed in parallel.
FIG. 14 shows a configuration of a data driven information processor as disclosed in Japanese Patent Laying-Open No. 6-60206, and FIGS. 15A and 15B show configurations in field of data packets in the FIG. 14 information processor. FIG. 15A shows a data packet 36 containing a field 51F having generation information stored therein, a field 52F having instruction information stored therein, a field 53F having destination information stored therein, and a field 54F having data stored therein. FIG. 15B shows a data packet 38 corresponding to data packet 36 with field 54F substituted by fields 54FA and 54FB each having data stored therein.
The FIG. 14 data driven information processor includes an input/output control portion 1, a program storage portion 2, a data pair generation portion 3, and an operation portion 4. The data driven information processor uses data packet 36 or 38 to externally communicate data. Input/output control portion 1 temporarily stores therein data packet 36 received external to the information processor or from operation portion 4, reads generation and destination information of the stored data packet, and in accordance with the read information outputs data packet 36 external to the information processor or to program storage portion 2 selectively.
Program storage portion 2 stores a data flow program having a plurality of sets formed of destination information and instruction information. When program storage portion 2 receives data packet 36 of FIG. 15A, it reads subsequent destination information and subsequent instruction information from the data flow program by addressing based on destination information of the received data packet 36, stores the read destination and instruction information to the received data packet 36 at fields 53F and 52F, respectively, and outputs the received data packet 36.
Data pair generation portion 3 receives data packet 36 from program storage portion 2, and if it is necessary, as determined in accordance with the instruction information in field 52F of the received data packet 36, data pair generation portion 3 stores data to data field 54F for executing the instruction information and outputs the received data packet 36. Otherwise, data pair generation portion 3 outputs the received data packet 36 as it is.
More specifically, data pair generation portion 3 receives data packet 36 from program storage portion 2 and allows a waiting of data packet 36 based on the packet's instruction information, as required. More specifically, if in accordance with the instruction information a decision is made that a waiting is required, two different data packet 36 matching in generation and destination information are detected and one of the detected two data packets 36 has its content of field 54F additionally stored to field 54F of the other data packet 36 so that the other data packet 36 is output in the configuration of data packet 38 shown in FIG. 15B. If a decision is made that a waiting is not required, the waiting is not provided and data packet 36 is output as it is.
Operation portion 4 receives data packet 38 or 36 from data pair generation portion 3, decodes the instruction information stored in the data packet at field 52F. In accordance with a result of the decoding, operation portion 4 performs an operation on data stored in the received data packet, stores a result of the operation to the received data packet as data, and outputs the data packet in the form of data packet 36 as shown in FIG. 15B.
Data packet 36 or 38 thus continues to circulate round a ring of input/output control portion 1, program storage portion 2, data pair generation portion 3 and operation portion 4 to allow an operation to proceed in accordance with the data flow program stored in program storage portion 2.
The data flow program stored in program storage portion 2 is represented by describing a flow of a data packet. This program description will be referred to as a “data flow graph.” FIG. 16 shows a data flow graph corresponding to a product-sum operation performed in the FIG. 14 data driven information processor. The FIG. 16 data flow graph has nodes ND1–ND5 and ND10 and an arrow connecting each node. The arrow indicates a data flow. Nodes ND1 and ND2 indicate input nodes for inputting data and node ND10 indicates an output node for outputting data. Nodes ND3–ND5 are assigned instruction information, respectively, executed in operation portion 4. To perform an operation according to the instruction information assigned to nodes ND3–ND5, respectively, a corresponding data packet circulates round the FIG. 14 loop formed by input/output control portion 1 through operation portion 4. As such, performing a product-sum operation once in accordance with the FIG. 16 data flow graph requires circulating round the loop of the information processor of FIG. 14 more than once.
With reference to the FIG. 14 processor and the FIG. 16 graph, a procedure of a product-sum operation in accordance with the following equation:ACC=ACC+X*Y  (1)will be described, wherein ACC represents accumulation data, and X and Y represent operands (data to be operated on).
When expression (1) is executed in the FIG. 14 processor, in the FIG. 16 graph nodes ND1 and ND2 are assigned “Y” and X”, respectively, and node ND3 is assigned “X” and “Y” and instruction information (a multiplication instruction “MUL”) assign to node ND3 is executed and the operation's result is obtained. During this period, data flows, as follows: when data packet 36 having field 54F with “X” stored therein is output via input/output control portion 1 to program storage portion 2, in program storage portion 2 subsequent instruction information and destination information are read by addressing based on destination information of data packet 36, and they are stored to data packet 36 at fields 52F and 53F, respectively, and data packet 36 is output to data pair generation portion 3 and waits in data pair generation portion 3 for entry of data packet 36 having stored therein data (“Y”) paired therewith.
A procedure similar to the above is followed to process data packet 36 having field 54F with “Y” stored therein. More specifically, data packet 36 is fed via input/output control portion 1 to program storage portion 2, where it has the subsequent instruction information and destination information stored thereto and it is then output to data pair generation portion 3, where a data pair of the received data packet and data packet 36 having waited with “X” stored therein is detected and data packet 38 having a data pair (“X” and “Y”) stored at fields 54FA and 54FB is generated and output to operation portion 4, where in accordance with instruction information of data packet 38 “X” and “Y” are multiplied and data packet 36 storing the multiplication at field 54F is output to input/output control portion 1.
The multiplication thus obtained flows in the FIG. 16 graph, as follows: the multiplication follows an arrow to enter a left hand of node ND5, while node ND5 has a right input receiving a value of an accumulation provided at node ND4. In accordance with instruction information (an addition instruction indicated by “ADD”) assigned to node ND5 a procedure similar to that followed for the aforementioned instruction information (“MUL”) is followed to add the multiplication and the accumulation together. The addition is then fed to stored in node ND4. This storage is also regarded as a single operation. As such, in accordance with the FIG. 16 graph, performing a product-sum operation once entails circulating data round the FIG. 14 loop three times.
As disclosed in Japanese Patent Laying-Open Nos. 6-60206 and 8-329038, data can circulate round the loop a reduced number of times to perform an operation faster.
In Japanese Patent Laying-Open No.6-60206 the FIG. 14 operation portion 4 is provided with an accumulator to perform a product-sum operation. With an accumulator incorporated in operation portion, as described above, a data driven information processor performs a production-sum operation, as represented in the data flow graph shown in FIG. 17. The FIG. 17 graph indicates that the FIG. 16 nodes ND3–ND5 are represented by a single node ND7 and simply executing an instruction information (a product-sum operation instruction “MULA”) assigned to node ND7 can achieve a product-sum operation.
If the FIG. 17 graph is applied to perform expression (1), data flows, as follows: two arrows to node ND7 are respectively assigned “X” and “Y” to execute product-sum operation instruction MULA.
More specifically, in the FIG. 14 processor, data packet 36 having field 54F with “X” stored therein is fed through input/output control portion 1 to program storage portion 2, where subsequent destination information and instruction information are stored to fields 53F and 52F, respectively, and data packet 36 is output to data pair generation portion 3 and waits there for entry of data (“Y”) paired therewith.
In accordance with a procedure similar to the above, data packet 36 having field 54F with “Y” stored therein is fed through input/output control portion 1 to program storage portion 2, where subsequent destination information and instruction information are stored to fields 53F and 52F, respectively, and data packet 36 is fed to data pair generation portion 3. In data pair generation portion 3 a data pair of input data packet 36 and data packet 36 having waited with “X” stored therein is detected and data packet 38 having fields 54FA and 54FB with “X” and “Y” stored therein is obtained and output to operation portion 4.
Operation portion 4 performs an operation based on content of data packet 38 and outputs data packet 36 with a result of the operation stored at fields 54F. In doing so, operation portion 4 allows a multiplication of “X” and “Y” to be accumulated by the accumulator therein and data packet 36 is output with the accumulation stored at field 54F.
Thus operation portion 4 incorporating an accumulator allows the FIG. 17 process to complete a product-sum operation by one third of a loop circulation frequency required for a product-sum operation performed in accordance with the FIG. 16 graph.
Furthermore in the data processor as disclosed in Japanese Patent Laying-Open No.8-329038 a data packet is provided with an additional field serving as an accumulator to implement a product-sum operation. The data driven information processor disclosed in the publication has operation portion 4 improved as shown in FIG. 18. Operation portion 4 includes an operation circuit 24, an addition circuit 28, a shifter 30 and a selector 32. Operation portion 4 receives a data packet 42 having fields 51F-53F, as aforementioned, fields 62F and 64F storing data therein, and a field 66F storing therein data ACC of an accumulation of product-sum operations. Operation portion 4 outputs a data packet 40 having fields 51F–53F, as aforementioned, a field 58F storing data therein, and a field 60F storing accumulation data ACC therein.
Field 52F of data packet 42 fed to operation portion 4 contains instruction information providing operation circuit 24, shifter 30 and selector 32 with content of an operation to be performed in operation circuit 24, shiftability by shifter 30, and information on selection of input data by selector 32 and selection of a destination of data, respectively.
In operation, fields 62F and 64F in data packet 42 fed to operation portion 4 have stored therein operands (“X” and “Y”) which are fed to operation circuit 24 and processed in accordance with corresponding instruction information and a result thereof is added by adder 28 to corresponding accumulation data ACC. The addition is shifted by shifter 30 by shiftability indicated by corresponding instruction information and it is then fed to selector 32. Selector 32 receives a value output from shifter 30, a value output from adder 28 and accumulation data ACC of field 66F and makes a decision based on corresponding instruction information as to which of the inputs is selected. In accordance with the decision, output data packet 40 can have field 60F updated in value, a result can be output to field 58F, an output from shifter 30 can be stored to field 58F, an output from adder 28 can be stored to field 60F, or the like.
While thus providing data packet 42 or 40 with data field 60F or 66F for storing accumulation data ACC renders the data packet large, data operation order or any other similar constraint can be eliminated to efficiently perform a product-sum operation.
The above-described, conventional product-sum operation process performs a single product-sum operation in a single operation as it is applied mainly to image data processing and it is intended to prevent a data packet residing in an information processor from resulting in poor efficiency of processing data. Accordingly, performing a product more than once entails circulating a data packet in a data driven information processor by the frequency of the product. With such a system, it is not suitable for example for finite impulse response (FIR) filtering or any other similar process increasing the frequency of execution of a product for a single product-sum operation. For example, for an FIR filtering of no more than 10 taps (i.e., product-sum is performed ten times), a data packet needs to circulate in a data driven information processor ten times. This means that the data packet moves past input/output control portion 1 through operation portion 4 ten times. If the data packet requires one unitary period of time to move past each portion, it would require as many as 40 unitary periods of time to circulate in the data driven information processor ten times.