The present invention generally relates to the art of scheduling and more particularly to the scheduling method employed for automatically developing a hardware pattern of integrated circuits (IC's) that form a processor. More specifically, the present invention relates to the automatic development of IC hardware in which a number of arithmetic units such as multipliers and adders are provided together with a control unit that controls each of the arithmetic units independently.
When designing an LSI (large-scale integrated circuit), particularly the LSI known as the ASIC (application specific integrated circuit) that carries out a specific predetermined program, the designer analyzes the input program or software algorithm and extracts (identifies) the control flow and data flow. The data flow describes the order of processing sequence of data and can be represented by a data flow graph shown in FIG. 1, wherein the flow of data is represented by a directed graph. In FIG. 1, the arrow represents the dependency while the capital letter represents the data. Further, the operator is represented by a circle. The small capital attached to the operator identifies the operator. As long as the operators have the same, common designation, these operators are the same operators.
The data flow as shown in FIG. 1 is analyzed such that the dependencies as shown in FIG. 1 between the operators are observed. When the sequence of behavior (control), that is, the sequence of operations and data transfers, has been determined, a scheduling result such as the one represented by a directed graph of FIG. 2 is obtained. As shown in the figure, a number is assigned to each control step, and the operators associated with the same control step are activated at the same time. The act of scheduling determines at which control step an operation or a data transfer specified in a behavior specification is executed.
A node in the directed graph of FIG. 2 indicates an operator, and the symbol associated with the node indicates the type of operation that is effected by the operator. An arrow indicates the dependency between operators. It will be noted in FIG. 2 that the operations b and c are executed after the execution of the operation a, and that the operations d is executed after the execution of the operations b and c. The dependencies between the operators obtained as a result of the scheduling are derived directly from the dependencies as shown in FIG. 1 between the operations in the input program. The scheduling is a process for allocating the operators to the control steps such that the dependencies as indicated in the directed graph are observed in determining the execution sequence of the operators.
If a scheduling is executed such that the number of operators is minimized over the entirety of the steps, the total number of required arithmetic units is minimized. The number of arithmetic units required for each control step is determined by the number of nodes. In the example of FIG. 2, the control step 1 requires one arithmetic unit, the control step 2 requires two arithmetic units, and the control step 3 requires one arithmetic unit. Hence, it is found that a maximum of two arithmetic units is required.
On the basis of the scheduling result as shown in FIG. 2, the type and number of the data bus components are determined as shown in FIG. 3. In other words, the scheduling result of FIG. 2 is translated into the allocation of arithmetic units, registers and memory devices as shown in FIG. 3.
It is desirable that a circuit having a minimum area and a high processing speed is synthesized based on the allocation result. In order to achieve this goal, a scheduling should be executed such that the total number of control steps is minimized, and such that the number of concurrent control steps (that is, the number of arithmetic units operating at the same time) is minimized over the entirety of the steps.
An art of "extracting parallelism" (described later in detail) in a program is introduced in order to achieve a high processing speed and an effective utilization of resources.
Conventionally, the extraction of parallelism in an input program including conditional branches has been independently carried out in each of the basic blocks that constitute the input program. For example, a statement including a conditional branch such as IF A THEN B ELSE C is conventionally scheduled such that the operators of the block B and those of the block C are independently subjected to scheduling. Until now it has been thought very difficult to perform an inter-block scheduling whereby parallelism is extracted across the blocks.
The conventional scheduling has the following characteristics.
The block structure in accordance with which the program has been written is faithfully observed. A branch testing is performed at the head of the branch, and operations are executed independently in accordance with the result of the testing. For example, in the case of a statement including a conditional branch such as IF A THEN B ELSE C, the scheduling is performed such that the operations of the block B and those of the block C are separately scheduled.
A description will now be given of the issues to be addressed by the art of extracting parallelism.
The extraction of parallelism, which is the key technology when scheduling a program run in a computer including a plurality of simultaneously operating arithmetic units, is a process for extracting operations that can be executed simultaneously. That is, it is a process for determining how a large number of arithmetic units (corresponding to operators) can best be utilized in a parallel manner. The requirements for operations that can be executed in a parallel manner are as follows.
1 Operations that can be executed in a parallel manner do not share arithmetic units or data among each other.
2 Operations that can be executed in a parallel manner do not require as input the result of any other operations.
The conventional art of extracting parallelism in a program including conditional branches has a drawback in that the extraction is not thorough enough. This is because there is no dependency between operators from blocks branching from the same node block.
Referring to an example of branches shown in FIG. 4, there is no dependency between B1 and B2 to which A branches, that is, either B1 or B2 can be executed first. Therefore, the conventional art of extracting parallelism, which art depends for its effectiveness upon the extraction of explicit dependency, can not extend beyond the branch nodes.
An inter-block extraction of parallelism has been difficult to achieve because it has to be responsible for various types of parallel processing.
Other requirements for the inter-block extraction of parallelism are that: it has to proceed under the condition that the result of extraction does not depend on which destination block a given block branches to; and that, if an operator originally belonging to a block like the block A or B is to be transferred to one of the branch blocks like the block B1, B2 . . . or Bn, an arrangement is necessary whereby that particular operator is to be activated in all of the branches, that is, the operation thereof should be executed whichever branch is chosen.
The objectives to be achieved by the scheduling include: automatic synthesis of a circuit having a high processing speed and a minimum area; synthesis of a circuit having a minimum number of arithmetic units (the requirement for the production of a chip having a minimum area can be met by allowing as many arithmetic units as possible to be shared so that the number of arithmetic units is minimized).
The number of required arithmetic units is equal to the maximum of the total number of arithmetic operations needed in each control step. Thus, when the number of operations required in a control step exceeds the number of operations in another control step, the number of arithmetic units required is determined by the greater of the number of operations. Accordingly, it is required not only that the overall processing speed is improved (the number of steps is minimized) but that the number of processes concurrently executed in each control step is minimized.
The specific drawbacks of the conventional art of scheduling will now be described.
In the conventional scheduling method, each basic block (explained later) is scheduled independently, that is, no scheduling is performed in which the operations within a branch are executed in advance of the branch testing. Hence, the arrangement of an input program into parallel processes can not be performed on an extensive scale. An imbalance is created between a block where a large amount of resources is required and a block where a smaller amount of resources is required. That is, there arise many instances where the resources that are needed in one basic block are not needed in another block, resulting in an inefficient use of chip area.
When the number of arithmetic operations required in one block exceeds the number of arithmetic operations required in another block, the number of arithmetic units is determined by the greater of the number of arithmetic operations. It is assumed in this type of scheduling that an operator in one block is activated only within that block, that is, only in concurrence with the other operators in that block. No measures is taken for relieving certain blocks of a large number of operations.
FIG. 5 shows an example of the result of the conventional scheduling, obtained when a program IF A THEN B1 ELSE B2; C; is scheduled. That is, the blocks A, B1, B2, C are independently (separately) scheduled. It will be noted from the figure that the number of arithmetic units required is five (see a step B21). It is obvious that there is a room for reducing the required area by reducing the number of arithmetic units.
Another scheduling which has the following characteristics is also proposed.
All the operations in the branches are executed, and only the correct results obtained through the branch operations are used. Since the branch operations are independent of each other, the resource used in the branch operations are shared within the blocks.
In this scheduling, since all the processes in the branch blocks are executed, unnecessary steps to execute unnecessary processes are performed, and unnecessary arithmetic units are needed.