1. Field of the Invention
The present invention relates to a data processor for multiprocessing a data-driven program and a control-driven program, in the same pipeline, on an instruction-by-instruction basis.
2. Description of the Related Art
In a data-driven processor, instructions are issued and executed according to their data dependencies. That is, any given instruction is issued when the result of an operation is passed from an instruction that generates its source operand. Then, when all source operands necessary for an operation become available, the instruction is fired, and the operation is performed. Further, based on the dynamic data-driven principle, multiprocessing free from the overhead associated with context switching is made possible on an instruction-by-instruction basis by assigning a unique identifier (color) for each program being executed. In this specification, each individual program in execution, identified by a color, is called a process. Further, in this specification, the data-driven processor refers to a processor that employs an architecture based on the dynamic data-driven principle.
As shown in FIG. 1, the data-driven processor comprises a plurality of processing elements (PEs) and a packet transfer switch (SW) interconnecting the PEs. Each PE in the data-driven processor comprises three functional blocks, i.e., a firing control unit (FC), an execution unit (EX), and a program storage unit (PS). Each functional block can be divided up into a plurality of pipeline stages. A matching memory (MM) is connected to the firing control unit (FC), while an instruction memory (IM) is connected to the program storage unit (PS). Further, a data memory (DM) is connected to the execution unit (EX). With each instruction, as a packet, cycling through the pipeline, a program is executed. Therefore, the pipeline shown in FIG. 1 is called a cyclic pipeline.
First, a packet is input into a PE via the packet transfer switch (SW). Here, the packet carries the color, operation code, operand, and the instruction number that uniquely identifies the instruction within the program. The firing control unit (FC) refers to the matching memory (MM) by using the instruction number and the color of the input packet as keys, and detects whether the firing condition is satisfied or not, that is, whether all operands necessary for the execution of the instruction are available or not. If the firing condition is not satisfied, the operand carried in the packet is stored in the matching memory (MM). On the other hand, when the firing condition is satisfied, a packet carrying the operand pair is generated and transferred to the execution unit (EX).
The execution unit (EX) executes the instruction based on the operation code. Next, using the current instruction number as the address, the program storage unit (PS) fetches from the instruction memory (IM) the instruction number and operation code of the instruction that consumes the result of the operation. The packet output from the program storage unit (PS) is transferred via the packet transfer switch (SW) to the PE where the fetched instruction is to be executed.
The data-driven processor has advantages such as being able to automatically extract various granularities of parallelism residing in problems and to perform multiprocessing without requiring overhead.
In the data-driven processor shown in FIG. 1, any given instruction will not be issued until after the execution of an instruction that generates its source operand is completed. As a result, instructions having data dependencies on each other cannot be processed by pipelining. Accordingly, a delay equal to one cycle of the cyclic pipeline occurs before one instruction having a data dependency on the other is executed. When the number of pipeline stages is denoted by N, and the degree of parallelism of the program by c, then the CPI (Cycles per Instruction) is max (1, N/c). Therefore, when the degree of parallelism of the problem is smaller than the number of pipeline stages, the efficiency of pipelining decreases.
In this way, in the prior art data-driven processor, as any given instruction will not be issued until after the execution of an instruction that generates its source operand is completed, a delay of an amount equal to the number of pipeline stages occurs before one instruction having data a dependency on the other is executed and, as a result, a sequential processing part in the program becomes a bottleneck.
In the prior art, attempts have been made to improve the performance of sequential processing by introducing control-driven processing into data-driven processing. In a strongly connected arc model, when all inputs to a subprogram called a strongly connected block become available, the strongly connected block is executed by monopolizing the pipeline. As the execution of instructions outside the strongly connected block is excluded, advanced control of instructions is facilitated. Furthermore, as tokens in the strongly connected block can be stored using registers, overhead associated with matching and the copying of operation results can be reduced. This, on the other hand, impairs the advantages that the prior art data-driven processor has, that is, the latency hidden by multiprocessing on an instruction-by-instruction basis and the retention of the response of each individual process.