1. Field of the Invention
The present invention relates to an execution control apparatus of a data driven information processor. Particularly, it relates to an execution control apparatus of a data driven information processor that can easily increase the performance and efficiency, improve operation flexibility and cost effectiveness by devising the packet waiting process mechanism and packet structure in the operation of image processing and motion picture processing.
2. Description of the Background Art
The data driven principle is considered to be an inherently natural information processing scheme. A data driven information processor is structured upon the basis of the data driven principle. The data driven processor is generally refers to a set of information processors developed from the research project for an effective mechanism that executes a target program directly converted from an executable description of high level specification.
The data driven principle will be discussed below. A program includes a plurality of instructions. Each instruction will be ready to execute when all the argument data required for its execution are present in a token (data packet) format. The instruction ready to execute is sent to the operation processing mechanism together with the argument data and the destination of the executed result.
The device that determines whether an instruction has become ready to execute and transmits the executable instruction, if any, together with the argument data and the address of the executed result to the operation processing mechanism is called a “firing control mechanism”.
At the operation processing mechanism, the instruction is executed. The executed result is transferred to the destination in the form of a token as the argument data of the next instruction to be executed.
FIG. 1 shows a structure of a system employing a general data driven information processor in image processing. Referring to FIG. 1, this system includes a data driven processor 1 having input ports IA and IB connected to data transfer paths 4 and 5, respectively, output ports OA and OB connected to data transfer paths 6 and 7, respectively, and an output port OV and an input port IV connected to data transfer paths 8 and 9, respectively, a memory interface 2 connected to data driven processor 1 through data transfer paths 8 and 9, and an image memory 3 interconnected with memory interface 2 through a memory access control line 1A.
Data driven processor 1 receives in time series through data transfer path 4 or 5 an input data packet having an identifier referred to as “generation number” assigned according to the input time sequence. Preset processing contents are stored in data driven processor 1. In response to the input of an input data packet, data driven processor 1 implements a process according to the processing contents stored therein. When access to image memory 3 is to be effected during processing, data driven processor 1 sends out on data transfer path 8 via output port OV an access request (reference/update and the like of the contents in image memory 3). Upon receiving this access request, memory interface 2 accesses image memory 3 via memory access control line 1A. The result is returned to data driven processor 1 via data transfer path 9 and input port IV.
Upon completion of the process by data driven processor 1 with respect to the input data packet, data driven processor 1 sends out via output port OA or OB an output data packet to data transfer path 6 or 7.
FIG. 2 shows the basic structure of a data driven information processor. FIG. 3 shows the basic structure of the input/output packet of a data driven information processor.
Referring to FIG. 2, data driven processor 1 includes a merging unit 10 that combines an externally applied input data packet and a data packet flowing in data driven processor 1. Data driven processor 1 further includes a firing control unit 11 that receives a data packet from merging unit 10, determines whether all the data required for instruction execution are available or not, and outputs a data packet including all the data. Data driven processor 1 further includes a functional unit 12 that performs an operation determined by the operation code on the data in the data packet from firing control unit 11, reassembles and outputs the result as a data packet including the node number of the next instruction. Data driven processor 1 additionally includes a program storage unit 13 where instructions are prestored, for receiving a data packet from functional unit 12, reading out the code of the instruction to be executed next and the next node number, assembling a data packet with the node number included in the relevant data packet as the destination, and output the data packet. Finally, data driven processor 1 includes a branch unit 14 for receiving a data packet from program storage unit 13 and sending the data packet outside or towards merging unit 10 according to the destination.
The data packet shown in FIG. 3 is applied to merging unit 10. This data packet is further sent to firing control unit 11. As shown in FIG. 2, firing control unit 11 includes a waiting storage region and a constant storage region, as disclosed in U.S. Pat. No. 5,640,525. The data packet applied to firing control unit 11 is stored in a waiting storage region 11a of firing control unit 11 until the counterpart data packet to form a pair to be processed arrives. When the data packet pair is available, a data packet including the operation code and the companion data is reassembled and is sent to functional unit 12. When the counterpart to be processed is not a data packet but constant data, the waiting at firing control unit 11 is not effected. In such a case, a data packet is reassembled by obtaining constant data from constant storage region 11b. 
Functional unit 12 processes the data in the received packet in accordance with the operation code in the packet. The resultant data packet is reassembled and is output.
Program storage unit 13 receives the data packet from functional unit 12, accesses its program memory at an address determined by the node number included in the received data packet, retrieves the operation code to be executed next as well as the next node number, reassembles a data packet, and output it. The data packet output from program storage unit 13 is selectively sent outside data driven processor 1 or to merging unit 10 by branch unit 14 according to the included node number.
Referring to FIG. 3, the data packet applied to data driven processor 1 includes a PE number 15 (PE stands for “processing element”.), a node number 16, a generation number 17, and data 18.
PE number 15 is an identifier that distinguishes data driven information processors structured as shown in FIG. 12 from each other in a system where a plurality of the data driven information processors are interconnected via a plurality of input/output control units (merging unit 10 and branch unit 14).
Node number 16 is used as an address to retrieve an operation code stored in program storage unit 13 of FIG. 2.
Generation number 17 is an identifier of a data packet input in a time series, and is also used as the address of image memory 3. Generation number 17, as an address in image memory 3, includes a field number FD, a line number LN, and a pixel number PX, as shown in FIG. 3. The relationship among the image, field number FD, line number LN and pixel number PX is as shown in FIG. 4.
The structure of the data packet applied to firing control unit 11 of FIG. 2 is shown in FIG. 5.
Referring to FIG. 5, this data packet has a structure similar to that of the data packet (FIG. 3) input to merging unit 10, and includes an operation code 19, a node number 20, a generation number 21, and data 22. It is to be noted that an operation code 19 is provided instead of PE number 15 of FIG. 3.
FIG. 6 shows the format of data stored in constant storage region 11b of firing control unit 11. Constant storage region 11b stores constant data to be used when the data companion to a data packet is a constant. Referring to FIG. 6, each record includes a VCD 23 (VCD is a flag indicating whether the constant data is valid (1) or invalid (0)), and constant data 24.
Referring to FIG. 7, waiting storage region 11a includes a hash overflow address 25 that stores the portion of the waiting data packet not used in calculating the hash address, a PRE 26 which is a flag indicating the presence of waiting data, i.e. indicating that a data packet is waiting (“1” when valid and “0” when invalid), and waiting data 27. Hash overflow address 25 stores the portion of node number 16 and a part of generation number 17 that is not used in calculating the hash address of the waiting data packet.
The format of the data packet output from program storage unit 13 is as shown in FIG. 5. When this data packet circulates through branch unit 14 and merging unit 10 to enter firing control unit 11, the constant data and VCD are obtained from constant storage region 11b of FIG. 6. Then, determination is made as to which of storing/firing/passing is to be selected as the operation for waiting storage region 11a of FIG. 7. This determination is based on the combination of operation code 16, the VCD obtained from constant storage region 11b and PRE 26 in waiting storage region 11a. 
First, the operation code in the data packet is decoded to determine whether the instruction to be executed by functional unit 12 is a 1-input instruction or a 2-input instruction. In the case of a 1-input instruction, that data packet will be “passed”. If the instruction is a 2-input instruction and VCD=1, the data packet is likewise “passed” since a counterpart data packet is not required as the data pair.
In other cases, i.e., in the case of a 2-input instruction and VCD=0, waiting for a paired data packet is required. Therefore, access to waiting storage region 11a is effected. Here, the hash address is calculated. If the PRE of the relevant address is “0”, the paired data packet has not yet arrived. Therefore, the process with respect to this data packet is “storing”. The data and the hash overflow address are stored in the relevant region, and PRE is updated to 1. PRE=1 of the relevant address implies that the paired data packet is present in waiting storage region 11a. Therefore, the data required for operation together with the input data packet are now available. Thus, the process is “fired”. The waiting data is output from waiting storage region 11a. The packet is reconfigured, and PRE is updated to 0. The reconfigured data packet has the structure shown in FIG. 8.
Referring to FIG. 8, this data packet includes a VCD 28, an operation code 29, a node number 30, a generation number 31, left data 32, and right data 33. Left and right data 32 and 33 are the data required for the operation of a 2-input instruction, and are obtained from the waiting data packet and the input data packet. This data packet is sent to functional unit 12.
By repeating the above-described process sequentially, data is processed while a data packet circulates through the data driven information processor.
In order to maintain throughput of the access to waiting storage region 11a, firing control unit 11 may be interleaved as disclosed in U.S. Pat. No. 5,640,525. More specifically, waiting storage region 11a includes a block 34 and two regions (memory 35 and memory 36 ), as shown in FIG. 9. Access is selectively distributed to either memory by the sorting of the data packet through block 34. Specification of which region is to be accessed is set by a bit called “interleave bit (IB)” in a register provided in block 34. IB is set by loading a parameter called “FIS parameter” (3 bits) to the aforementioned register in block 34.
The relationship between FIS and IB is summarized in the following Table 1. The FIS parameter also serves to specify the type of the hash function in calculating the hash address.
TABLE 1FIS000001010011100101110111IBG1G2G3G4N0N1N2N3Gn: n-th bit of the Generation NumberNn: n-th bit of the Node Number
It is to be noted that the generation number includes 5 bits, i.e., G0–G4. Therefore, G1, for example, represents the second least significant bit of the generation number.
As mentioned before, the waiting storage is realized by an interleave structure in the conventional art shown in FIG. 9. Margin in capacitance must be provided for each memory. As a result, the memory cost is increased. Furthermore, there is a possibility of the access being concentrated to one of memories 35 and 36 depending upon the sequence of the data packets arriving in a stream. The intended throughput cannot be maintained sufficiently in such cases.
For example, it is assumed that FIS=‘000’ in Table 1. Here, IB is determined by the first bit of generation number 31. Assuming that data packets whose generation numbers are 0(0b0 . . . 000000), 4(0b0 . . . 000100), 8(0b0 . . . 001000), 12(0b0 . . . 001100), . . . sequentially arrive at firing control unit 11, IB=0 is constantly established. As a result, access is concentrated to memory 35, so that sufficient throughput cannot be achieved.
Furthermore, the data packet in the data driven information processor is transferred in a pipelined manner by transfer between latches that are sequentially coupled by means of a self-timed transfer control circuit (C element). Reading the PRE takes one clock cycle necessary for the transfer between C elements. Another clock cycle is required for update value detection and update. An execution control apparatus of a data driven information processor that can avoid such throughput degradation depending upon the arriving order of data packets is desired.
The demands placed on the processing performance in image processing and video processing are ever increasing. The data driven information processor has architecture suitable for such image processing and video processing. Therefore, an execution control apparatus of a data driven information processor is desired that can process data more speedily than other architecture and with higher throughput, and that allows flexible programming.
Since the demanded operations are becoming more complicated, highly complex instructions will be necessary. However, only a 2-input instruction at most is implemented in prior art. There is a case where one wishes to obtain the resultant output from three or more equal inputs. Since one node can perform only a 2-input process as shown in FIG. 10(A) in the conventional data driven information processor, the only possible approach to realize a 3-input instruction is to combine nodes 38 and 39 as shown in FIG. 10(B) through programming. The response time is additionally required in comparison to the case where one node can perform a 3-input process. Accordingly, execution becomes inefficient.
Particularly in the structure shown in FIG. 10(B), it is to be noted that the operation between inputs A and B is carried out at node 38, and then the operation of node 39 is carried out between the resultant output X and an input C. This means that input C must be in waiting storage region 11a until the process of node 38 is completed and X arrives at firing control unit 11. As a result, waiting storage region 11a will be used inefficiently. It is desired to effectively utilize the waiting storage region and improve the processing efficiency by carrying out the process of three or more inputs with one node.
As the instructions become more complex, not only scalar data but also vector data will be more frequently processed. Correspondingly, the need will arise to handle the constants stored in constant storage region 11b as vector data. In this case, in order to allow a complex instruction to be handled more flexibly, it is preferable to selectively use the constants, not fixed to one type of either scalar or vector, but switched therebetween for the same instruction.