1. Field of the Invention
The present invention relates to a self-synchronous transfer control circuit and a data driven information processing device using the same, and more particularly, to a data driven information processing device in which a self-synchronous transfer control circuit enabling transfer of a plurality of pulses from one pulse is used for multi-output instructions, to enhance program performance.
2. Description of the Background Art
As the use of multimedia has been increased in recent years, a large amount of operations are required in image processing and so forth. The data driven information processing device (hereinafter referred to as a data driven processor) has been proposed as a device for rapidly processing such a large amount of operations. In the data driven processor, a process is carried out according to a rule in that the process is performed when there are all the input data required for a process and a resource required for the process such as an operation device is allocated. A data transmission device employing an asynchronous handshake system is used for a data processing device including information processing operation of the data driven type. In such a data transmission device, a plurality of data transmission paths are connected with each other, which mutually transmit/receive data transfer request signals (hereinafter referred to as SEND signal) and transfer enabling signals indicating whether or not the data transfer is permitted (hereinafter referred to as ACK signal), to autonomously transfer data.
FIG. 12 shows a format of a data packet to which a conventional art and the present invention are applied. In FIG. 12, the data packet includes a destination node number area F1 for storing a destination node number ND#, a generation number area F2 for storing a generation number GN#, an instruction code area F3 for storing an instruction code OPC and a data area F4 for storing data DATA. The generation number herein represents a number for distinguishing data groups to be subjected to parallel processing from each other. The destination node number represents a number for distinguishing input data within one generation from each other. The instruction code is for executing instructions stored in an instruction decoder.
FIG. 13 is a block diagram showing a configuration of a data transmission path. The data transmission path includes a self-synchronous transfer control circuit (hereinafter referred to as a C element) 1a and a data holding circuit (hereinafter referred to as a pipeline register) 1b constituted by a D type flip-flop. C element 1a includes a pulse input terminal CI receiving a pulse, a transfer enabling output terminal RO outputting a transfer enabling signal indicating enabling or disabling of the transfer, a pulse output terminal CO outputting a pulse, a transfer enabling input terminal RI receiving the transfer enabling signal indicating enabling or disabling of transfer, and a pulse output terminal CP for applying a clock pulse controlling the data holding operation of pipeline register 1b. 
FIGS. 14A to 14E are timing charts illustrating the operation of C element shown in FIG. 13. When C element 1a receives a pulse indicated in FIG. 14A from terminal CI, if the transfer enabling signal input at terminal RI shown in FIG. 14E is enabled, C element outputs a pulse indicated in FIG. 14D from terminal CO and also outputs a pulse shown in FIG. 14C to pipeline register 1b. In response to the pulse applied from C element 1a, pipeline register 1b holds the applied input packet data, and then outputs the held data as output packet data.
FIG. 15 is a block diagram showing an example where the data transmission paths shown in FIG. 13 are connected in sequence via predetermined logic circuits. The input packet data is processed sequentially in logic circuits 3d and 3e while sequentially being transferred along pipeline registers 3a→3b→3c. In FIG. 15, for example, when pipeline register 3a is in a data holding state while pipeline register 3b in the subsequent stage is also in the data holding state, no data is transmitted from pipeline register 3a to pipeline register 3b. 
Further, if pipeline register 3b in the subsequent stage is in non-data holding state or has come to be in the non-data holding state, data is transmitted from pipeline register 3a to logic circuit 3d, where the data is processed, and to pipeline register 3b, taking at least a preset delay time. A control which is called a self-synchronous transfer control asynchronously transmits data with at least preset delay time, in response to SEND signals input to/output from terminals CI and CO and ACK signals input to/output from terminals RI and RO, which are transmitted between adjacent pipeline registers connected as described above. A circuit controlling such data transfer is called a self-synchronous transfer control circuit.
FIG. 16 is a detailed circuit diagram of the C element shown in FIG. 15. The C element may be, for example, the one described in Japanese Patent Laying-Open No. 6-83731. In FIG. 16, pulse input terminal CI receives a pulsed SEND signal (a transfer request signal) from a preceding stage, and transfer enabling output terminal RO outputs an ACK signal (a transfer enabling signal) to the preceding stage. Pulse output terminal CO outputs a pulsed SEND signal to a subsequent stage, and transfer enabling input terminal RI receives an ACK signal from the subsequent stage.
A master reset input terminal MR receives a master reset signal. When a pulse at a logic high or “H” level is applied to master reset input terminal MR, the pulse is inverted at an inverter 4e, and the inverted pulse resets flip-flops 4a and 4b to initialize C element. Then, an “H” level signal is output, as an initial state, from both pulse output terminal CO and transfer enabling output terminal RO. The “H” level output from transfer enabling output terminal RO indicates a transfer enabling state, whereas a logic low or an “L” level output therefrom indicates a transfer disabling state. Further, the “H” level output from pulse output terminal CO indicates the state where no data transfer is required for the subsequent stage, whereas the “L” level therefrom indicates a state where the data transfer is required for or the data is being transferred to the subsequent stage.
When the signal of “L” level is input to pulse input terminal CI, i.e., when data transfer is required from the preceding stage, flip-flop 4a is set and outputs an “H” level signal to an output Q. The “H” level signal is inverted at an inverter 4d, and thus an “L” level signal is output from transfer enabling output terminal RO, which inhibits further data transfer. After a certain period of time, an “H” level signal is input to pulse input terminal CI, terminating data setting from the preceding stage to the C element. In such a state, when the circuit is in a state where an “H” level signal is input from transfer enabling input terminal RI, i.e. where the data transfer from the subsequent stage is permitted, and also in a state where pulse output terminal CO is outputting no “H” level signal, i.e., is transferring no data to the subsequent stage (the state where no data transfer is required for the subsequent stage), an NAND gate 4c is activated, outputting an “L” level signal.
As a result, flip-flops 4a and 4b are both reset, and flip-flop 4b outputs an “H” level signal, via a delay element 4e, from pulse output terminal CP to the pipeline register, together with a SEND signal of the “L” level, via a delay element 4f, from pulse output terminal CO to the C element in the subsequent stage. That is, the data transfer for the subsequent stage is required. The C element in the subsequent stage which has received the SEND signal of the “L” level outputs an ACK signal made to be at “L” level from terminal RO, indicating transfer inhibition, such that no further data is transferred to the C element. The C element inputs the “L” level ACK signal from transfer enabling input terminal RI, setting flip-flop 4b. As a result, the “L” level signal is output, via delay element 4e, from pulse output terminal CP to the pipeline register, and also the “H” level SEND signal is output, via delay element 4f, from pulse output transmit CO to the subsequent stage, terminating the data transfer.
FIG. 17 is a schematic block diagram of a conventional data driven processor configured including the data transfer device shown in FIG. 15. In FIG. 17, a data driven processor Pe includes a junction unit JNC, a firing control unit FC, an operation unit FP, a program storage unit PS, a branch unit BRN, a plurality of pipeline registers 3a to 3c, and a plurality of C elements 2a to 2c. Each of C elements 2a to 2c controls packet transfer for a corresponding processing unit (FC, FP or PS) by exchanging packet transfer pulses (signals at CI, CO, RI and RO) to C elements in preceding and subsequent stages. In response to pulse inputs from the corresponding C elements 2a to 2c, pipeline registers 3a to 3c each takes in the data input from the preceding processing unit and holds the data, and delivers it to the output stage, where the data is held until the next pulse is input.
In FIG. 17, when the data packet shown in FIG. 12 is input to processor Pe, the input packet first passes through junction unit JNC, is transferred to firing control unit FC, and a pair data is formed from identical packets based on a destination node number ND# and a generation number GN#. That is, two different data packets having identical node number ND# and generation number GN# are detected, and the data in one of the data packets is additionally stored in data area F4 (FIG. 12) of the other data packet, outputting the other data packet. The packet of which the pair data (a set of data) is stored in data area F4 is subsequently transmitted to operation unit FP. Operation unit FP inputs the transmitted data packet, executes a predetermined operation for the content of the input packet based on instruction code OPC of the input packet, and stores the operation result in data area F4 of the input packet. The input packet is subsequently transmitted to program storage unit PS.
Program storage unit PS inputs the transmitted data packet, and reads node information (node number ND#) to which the packet should go next from the program memory in program storage unit PS, instruction information (instruction code OPC) to be subsequently executed, and a copy flag CPY. The read destination node number ND# and instruction code OPC are then stored respectively in destination node number area F1 and instruction code area F3 of the input packet. Further, if the read copy flag CPY is “I”, the subsequent address in the program memory is determined also to be valid, and thus the packet storing destination node number ND# and instruction code OPC stored in the next address will also be generated.
The packet output from program storage unit PS is transmitted to branch unit BRN, and is output based on its destination node number ND#, or is returned again into the processor. To make three copies of identical data, the packet returned to the processor will be used for the copying process. Thus, to make a plurality copies of the identical data, the packet must be returned to the processor a plurality of times for the copying process.
FIG. 4A is a data flow diagram showing an example where four copies of the input data are made. An NOP (copying without operation) instruction 16a is executed for the input data to output data 16h and 16i. Data 16i is executed as an OPC1 instruction 16d corresponding to instruction code OPC of the packet shown in FIG. 12, and data 16h is executed as an NOP instruction 16b. In NOP instruction 16b, copying is executed to output data 16j and 16k. Data 16k is executed as an OPC2 instruction 16e, and data 16j is executed as an NOP instruction 16c. In NOP instruction 16c, copying is executed to output data 16l and 16m. Data 16m is executed as an OPC3 instruction 16f, and data 16l is executed as an OPC4 instruction 16g. Thus, to make four copies of data, two packet-copying instructions must be executed three times.
FIG. 18 is a diagram showing an example where a conventional data driven processor is used to execute a multiplication instruction. In FIG. 18, a multiplier 3f and a shifter 3g are provided as logic circuits 3d and 3e shown in FIG. 15 described earlier. For example, 12-bit data is multiplied with another 12-bit data, the operation result will be 24-bit data. However, the data to be stored in the data field as a packet format is limited to 12 bits as shown in FIG. 12, and therefore the resulted data of 24 bits must be divided into higher 12 bits and lower 12 bits for operation. Thus, shifter 3g has been used to execute two instructions, such as an instruction outputting a packet including the higher 12-bit data and an instruction outputting a packet including the lower 12-bit data, to realize the operation. As described above, when a process is to be executed such that a plurality of copies of packets are made or a plurality of identical data are required in the conventional data driven processor, it can be realized by executing the NOP instruction a plurality times, which however generates a useless go-around packet for executing the NOP instruction, i.e. a go-around packet returned from the packet output to the packet input as shown in FIG. 17. This has made it difficult to enhance the performance of the program execution.