1. Field of the Invention
The present invention relates to data-driven processors and data processing methods thereof and, more particularly to a data-driven processor capable of performing a data-driven operation on data in a multiple-precision form (hereinafter referred to as multiple-precision data) and a data processing method thereof.
2. Description of the Background Art
For processing a large amount of data at high speed, a parallel processing is effective. Among architectures designed for parallel processing, what is called a data-driven architecture has particularly received a great deal of attention.
In a data-driven information processing system, a process goes in parallel according to a rule that “a process is performed when all input data required for a certain process is available and a resource such as an operating device required for that process is allocated.”
FIG. 13 is a block diagram showing a data-driven information processing system employed in the prior art and an embodiment of the present invention. FIG. 14 is a diagram showing a conventional data-driven processor. FIGS. 15A and 15B are diagrams showing fields of data packets used in the prior art and the embodiment of the present invention.
FIG. 15A shows a basic structure of an input/output data packet PA of the data-driven processor. FIG. 15B shows a basic structure of a data packet PA1 transmitted within the data-driven processor.
Data packet PA shown in FIG. 15A includes a field 18 storing a processing element PE, a field 19 storing a node number N, a field 20 storing a generation number G and a field 21 storing data D. Data packet PA1 shown in FIG. 15B includes fields 19-21 as in FIG. 15A, and a field 22 storing an instruction code C.
Referring to FIG. 13, the data-driven information processing system includes a conventional data-driven processor 1 (a data-driven processor 10 of the embodiment of the present invention which will later be described), a data memory 3 preliminarily storing a plurality of data, and a memory interface 2. Data-driven processor 1 (10) includes input ports IA, IB and IV respectively connected to data transmission lines 4, 5 and 9 as well as output ports OA, OB and OV, respectively connected to data transmission lines 6, 7 and 8.
Data packet PA is input to data-driven processor 1 (10) in time series through input port IA or IB from data transmission line 4 or 5. A prescribed content to be processed is preliminarily stored in data-driven processor 1 (10) as a program, based on which a process is performed.
Memory interface 2 receives through data transmission line 8 an access request to data memory 3 (a request for referring/updating the content of data memory 3) output from output port OV of data-driven processor 1 (10). Memory interface 2 makes an access to data memory 3 through a memory access control line SSL in accordance with the received access request, and applies the access result to data-driven processor 1 (10) through data transmission line 9 and input port IV.
Data-driven processor 1 (10) performs a process on input data packet PA, and then outputs data packet PA through output port OA and data transmission line 6 or output port OB and data transmission line 7.
FIG. 14 shows the structure of a conventional data-driven processor 1. Referring to FIG. 14, data-driven processor 1 includes an input/output controlling portion 11, a joint portion 12, a firing controlling portion 13 used for a data-driven process, an operating portion 14 connected to a built-in memory 15, a program storing portion 16, and a branch portion 17.
Referring to FIGS. 15A and 15B, processing element PE is information used for identifying data-driven processor 1 where corresponding data packet PA should be processed in a system provided with a plurality of data-driven processors 1. Node number N is used as an address for making an access to the content of program storing portion 16. Generation number G is used as an identifier for uniquely identifying a data packet which is input in time series to data-driven processor 1. If data memory 3 is an image data memory, generation number G is also used as an address for making an access to data memory 3. In this case, generation number G indicates a field number F#, line number L# and pixel number P# successively from an upper bit.
In operation, when applied through data transmission line 4 or 5 to data-driven processor 1 designated by processing element PE, data packet PA of FIG. 15A turns to data packet PA1 of FIG. 15 at input/output controlling portion 11. Namely, input/output controlling portion 11 discards field 18 of processing element PE of input data packet PA, acquires instruction code C and new node number N based on node number N of input data packet PA for respectively storing them in fields 18 and 19 of input data packet PA, and then outputs data packet PA1 to joint portion 12. Thus, data packet PA1 applied from input/output controlling portion 11 to joint portion 12 has the structure shown in FIG. 15B. Note that generation number G and data D remain unchanged at input/output controlling portion 11.
Joint portion 12 successively inputs data packet PA1 from input/output controlling portion 11 and data packet PA1 output from branch portion 17 for outputting them to firing controlling portion 13.
Firing controlling portion 13 includes a waiting memory 731 for detecting a pair of data packets PA1 (this detection is referred to as firing), and a constant data memory 732 storing at least one constant data. Firing controlling portion 13 waits, if necessary, for data packet PA1 applied from joint portion 12 with use of waiting memory 731. As a result, data D in field 21 of one of two data packets PA1 having the same node number N and generation number G, i.e., a pair of two different data packets PA1, is additionally stored in field 21 of the other data packet PA1. The other data packet PA1 is output to operating portion 14. At the time, one data packet PA1 is deleted. Here, if the operation target is constant data, rather than data packet PA1, waiting is not performed at firing controlling portion 13. In this case, constant data is read out from constant data memory 732 and additionally stored in field 21 of data packet PA1, which is then output to operating portion 14.
Operating portion 14 receives data packet PA1 from firing controlling portion 13 and decodes instruction code C of received data packet PA1. Based on the decoding result, it performs a prescribed process. If instruction code C indicates an operation instruction with respect to the content of data packet PA1 including data D, a prescribed operation is performed on the content of data packet PA1 in accordance with instruction code C. The result is stored in data packet PA1, which is then output to program storing portion 16. Alternatively, if instruction code C of data packet PA1 indicates a memory access instruction, an access to built-in memory 15 is made, and data packet PA1 storing the access result is output to program storing portion 16. Note that the memory connected to operating portion 14 is not necessarily memory 15 which is contained in data-driven processor 1, but may be a memory externally connected to the processor.
If instruction code C indicates an access instruction with respect to data memory 3, operating portion 14 applies data packet PA1 to memory interface 2 through data transmission line 8 as an access request.
Memory interface 2 receives data packet PA1 applied through data transmission line 8 and makes an access to data memory 3 through memory access control line SSL in accordance with the content of received data packet PA1. The access result is stored in field 21 of input data packet PA1 as data D, and data packet PA1 is applied to operating portion 14 through data transmission line 9.
Program storing portion 16 has a program memory 161 in which a data flow program consisting of a plurality of subsequent instruction codes C and node numbers N. Program storing portion 16 receives data packet PA1 applied from operating portion 14 and reads out the subsequent node number N and subsequent instruction code C from program memory 161 by addressing based on node number N of received data packet PA1. Program storing portion 16 then stores read out node number N and instruction code C respectively in fields 19 and 22 of received data packet PA1, which is then output to branch portion 17.
Branch portion 17 determines if instruction code C of applied data packet PA1 is to be executed in operating portion 14 within data-driven processor 1 or in an operating portion 14 of external data-driven processor 1. If it is determined that instruction code C is to be executed in operating portion 14 of external data-driven processor 1, data packet PA1 is output to input/output controlling portion 11 which then outputs data packet PA1 to an external portion of the processor from an appropriate output port. On the other hand, if it is determined that instruction code C is to be executed in operating portion 14 within data-driven processor 1, data packet PA1 is applied to joint portion 12.
Thus, data packet PA1 circulates within data-driven processor 1, whereby a process goes in accordance with a data flow program preliminarily stored in program memory 161.
The data packet is asynchronously transmitted by handshake in data-driven processor 1. The process in accordance with the data flow program stored in program memory 161 proceeds in parallel in accordance with a pipeline process where the data packet circulates in data-driven processor 1. Thus, in the data-driven processing method, parallelism of data packet process is high and a flow rate of the data packet circulating in the processor governs in part a processing performance.
In recent years, the feature of such a data-driven processing method is applied in the fields of image processing or video signal processing which requires an intensive high-speed operation. Because of their nature, image data or video signal data have a small bit length. Accordingly, the data of a small bit length is processed in image processing or video signal processing. Presently, field 21 of data D shown in FIGS. 15A and 15B has a 12-bit length. Similarly, 1 word in data memory 3 or built-in memory 15 has a 12-bit length.
Unlike the above described image processing or video signal processing, some processes involve processing of data having an extremely large bit length. Examples of such a process include a public key encryption using a public key or corresponding decryption.
Here, the above mentioned public key encryption will be described. If a certain text (data) is to be transmitted to a specific party while ensuring security to the other parties, the text (data) to be transmitted is called a plaintext and an encrypted text to be transmitted is called a cipher text. A parameter for converting (encrypting) the plaintext to the cipher text according to a certain rule or for converting (decrypting) that cipher text back to the plaintext is called a key. In a public key cryptosystem, because of its mathematical nature, a cipher text cannot be decrypted or readily decrypted, if possible, unless secret keys of a transmitter and receiver are known even if the cipher text or public key is known to the third party. An RSA (Rivest, Shamir, Adleman) or DH (Diffie Hellman) is representative of such a public key cryptosystem. In the following, key exchange in accordance with DH will be described by way of example.
Assume that two persons, A and B, perform key exchange. A and B respectively generate their own secret keys S (A) and S (B) based on which their own public keys P (A) and P (B) are generated in the following manner. Note that secret keys S (A) and S (B) are both 1024 bits in length. In the public key encryption, a secret key generally has a 1024-bit length.
Public key P (A)=G^S (A) modP and public key P (B)=G^S (B) modP are found. Here, “^” and “mod” respectively represent power operation and residual operation. Variables G and P are preliminary determined as constants. A and B exchange their own public keys. Upon receipt of the public key of the counterpart, a common key C is generated as follows. Specifically, A generates common key C in accordance with C=P (B)^S (A) modP, and B generates common key C in accordance with C=P (B)^S (A) modP.
Common keys C generated by A and B have the same value. In this manner, the key can be shared by the transmitter and receiver while preventing the secret keys from being known to the third party. Note that S (A), S (B) and P are data of a 1024-bit length. Similarly, P (A), P (B) and C are also data of a 1024-bit length.
To obtain the operation result of “X^YmodZ” used for generating the above described public key, multiplication or square operation with constant X and division with a divisor Z are alternately repeated. To store intermediate results of the repetitive operations, operation regions U (2048 bits) and V (2048 bits) are prepared. A process for the operation of “X^YmodZ” is shown in FIG. 16.
FIG. 16 is a flow chart showing a process for the operation of X^YmodZ in a von Neumann machine. The process flow shown in FIG. 16 will be described. Variables X, Y and Z have a 1024-bit length. These variables are start in an internal memory of the machine and read out therefrom at the start of the process. Thereafter, the intermediate operation and storage of the result are alternately repeated. Note that in the process flow, variable Y[k] represents a value of the k-th bit of variable Y.
First of all, initial setting is made in step S1. Namely, the content of operation region U is reset and 1 is set in operation region V. Then, 1023 is set to control variable k. More specifically, while decrementing control variable k starting from 1023 down to 0 by 1, the following operations are repeated.
In step S2, the process branches depending on whether variable Y[k] is 1 or 0. If variable Y[k] is 1, the process goes to step S3. If 0, the process goes to step S6 which will later be described.
In step S3, an operation of (V×X) is performed and the result is stored in operation region U. In the following step S4, an operation in accordance with U%Z is performed, i.e., a residue of (the value stored in operation region U÷Z) is found and stored in operation region V. In the following step S5, a determination is made as to if control variable k is 0. If not 0, in step S6, the value of operation region V is raised to the second power and the result is stored in operation region U. In the next step 7, an operation in accordance with U%Z is performed, i.e., a residue of (the value stored in operation region U÷Z) is found and stored in operation region V. In the next step S8, the value of control variable k is decremented by 1. The following steps S2-S8 are repeated until it is determined that k=0 in step S5. As a result, the value stored in operation region V is determined the operation result of “X^YmodZ”.
As described above, the need exists for processing multiple-precision data as represented by the public key encryption and decryption. However, no method of processing multiple-precision data by the conventional data-driven processor 1 has been established. In detail, the public key encryption requires a bit length of about 1024 bits. However, it is extremely difficult to form in data-driven processor 1 a calculator, data packet and memory words with such a bit length because of physical restrictions of a circuit mounting area and bus width when data-driven processor 1 is implemented by an LSI (abbreviation for Large Scale Integration).