1. Field of the Invention
The present invention provides a method for improving processing efficiency of pipeline architecture, and more particularity a method for determining an executing time period of a current calculation task according to a previous calculation task.
2. Description of the Prior Art
Pipeline architecture is one of the most widely used calculation architectures for microprocessor systems. Pipeline architecture utilizes a time pulse to control a register file that can store results from executing a series of calculation tasks with a plurality of functional units, and the results are identically transmitted to a functional unit to execute a next calculation task. An advantage of pipeline architecture is the capable of simultaneous control. Because the functional units usually execute various calculation tasks with different complexities, executing time periods of the various calculation tasks differ in length. Therefore, under a condition of utilizing pipeline architecture, a very complicated and long calculation task easily results in a wrong calculation task because data cannot synchronize at some point in time, especially under a condition of complicated data dependency between the functional units. Thus, utilizing pipeline architecture cannot suitably divide a series of complicated calculation tasks and help to simplify the complexities of simultaneous control.
Please refer to FIG. 1. FIG. 1 is a functional diagram of a processor 10 of pipeline architecture. The processor 10 comprises a first functional unit 12, a second functional unit 14, and a control unit 16. The first functional unit 12 is for executing a calculation task. The second functional unit 14 is for executing another calculation task. The control unit 16 is electrically connected to the first and the second functional units for generating a plurality of control signals to control the first and the second functional units 12, 14. The control unit 16 will depend upon desired calculation tasks to orderly control the first and second functional units 12, 14 to execute calculation tasks. Simultaneously, depending upon the desired calculation tasks, the control unit 16 controls input data to the first and second functional units 12, 14 (IN1 and IN2 in FIG. 1) and exports results from executing the calculation tasks of the first and second functional units 12, 14 (OUT1 and OUT2 in FIG. 1).
Next, the first functional unit 12 is assumed as an Arithmetic and Logic Unit (ALU), and the second functional unit 14 is a Multiplication and Accumulation Unit (MAC). Because the MAU executes a more complicated calculation task than the ALU, the second functional unit 14 needs a longer executing time period than the first functional unit 12. For example, an executing time period of the first functional unit 12 is one instruction cycle and an executing time period of the second functional unit 14 is two instruction cycles.
Please refer to FIG. 2. FIG. 2 is a timing prospective view of the processor 10 executing a calculation task. FIG. 2 shows when the processor 10 utilizes the control unit 16 to generate a control signal to control a functional unit for executing a calculation task, according to timing sequence levels such as fetch instruction (level F), decode (level D), read register (level R), execution (level E1 and E2), and write back (level W). Each level as mentioned above takes one instruction cycle. Please note, parts of an executing calculation task are the parts of utilizing calculation capacities of functional units where the parts of the executing calculation tasks are only levels E1 and E2 that are in coordination with the longest executing time period of the functional unit (as the second functional unit 14) of the processor 10. If the processor 10 comprises the functional units needing longer executing time periods, parts of executing calculation tasks can increase an amount of levels depending upon requirement.
Please refer to FIG. 3. FIG. 3 is a timing prospective view of the processor 10 executing a series of calculation tasks. Please note, FIG. 3 shows a timing of calculation task that does not consider data dependency between various calculation tasks and as an ideal status. In FIG. 3, according to desired calculation tasks, the control unit 16 utilizes control signals to orderly control the first and second functional units 12, 14 to execute a series of calculation tasks (as a first, a second, a third, and a fourth calculation tasks in FIG. 3) wherein each adjacent calculation task differs one instruction cycle. Under such arrangement, during a specific time period of calculation processing (as the part within dotted line in FIG. 3), different calculation tasks lie in different levels. Therefore, the different calculation tasks can be executed simultaneously because of utilizing different system sources. Please note, as mentioned above, an executing time period of the first functional unit 12 is only one instruction cycle. Therefore, the prior art defines either the level E1 or level E2 to execute a real calculation task and maintains the other one in an unused status. Next, please refer to FIG. 4 and FIG. 5 to illustrate a condition of data dependency between different calculation tasks and a timing of the processor 10 when executing a series of calculation tasks.
FIG. 4 is a timing prospective view of the processor 10 executing a calculation task of r3=(r1*r2)+r4. In FIG. 4, r0=r1*r2 and the first functional unit 12 will execute a real calculation task in the level E1 (as shown within the dotted line in FIG. 4) are simultaneously assumed. In FIG. 4, in the first calculation task, the processor 10 will utilize the second functional unit 14 (MAU) to execute r0=r1*r2. After a result of the first calculation task comes out, the result will be transmitted to the first functional unit 12 (ALU) to execute the second calculation task (r3=r0+r4). However, two instruction cycles are needed for an execution of the second functional unit 14 that includes the level E1 and level E2 to complete the calculation task, and the second calculation task has data dependency on the result from the first calculation task (shown as an arrow 18 in FIG. 4). Thus, the second calculation task cannot be executed right after one instruction cycle of the first calculation task as shown in FIG. 3. As shown in FIG. 4, after two instruction cycles of the first calculation task, the second calculation task is then executed (the second calculation task stalls one instruction cycle), otherwise the result of the second calculation task will be wrong because of an incorrect data input. In the present example, the first functional unit 12 executes a real calculation task in the level E2 is initially assumed. The timing in FIG. 3 will not influence a correction of the result from the calculation task because of data dependency. Therefore, as mentioned above a stall of the calculation task will not occur.
Another example is shown in FIG. 5, a timing perspective view of the processor 10 executing a calculation task of r3=(r1+r2)*r4. In FIG. 5, r0=r1+r2 and the first functional unit 12 executing a real calculation task in the level E2 are simultaneously assumed (as the parts within the dotted line in FIG. 5). In FIG. 5, during the first calculation task, the processor 10 utilizes the first functional unit 12 (ALU) to execute a calculation of r0=r1+r2. After a result from the first calculation task comes out, the result is then transmitted to the second functional unit 14 (MAU) to execute the second calculation task. However, because the first functional unit 12 starts a real calculation in the level E2, the second calculation task having data dependency on the result from the first calculation task (as an arrow 20 in FIG. 5) cannot be executed right after one instruction cycle of the first calculation task as shown in FIG. 3. But, as shown in FIG. 5, two instruction cycles after the first calculation task, the second calculation task starts to execute (the second calculation task stalls one instruction cycle), otherwise, the result of the second calculation task will be wrong because of an incorrect data input. Moreover, in the present example, the first functional unit 12 executing a real calculation task in the level E1 is initially assumed. The timing in FIG. 3 will not influence a correction of the result from the calculation task because of data dependency. Therefore, as mentioned above, a stall of the calculation task will not occur.
Integrated in FIG. 4 and FIG. 5 and as mentioned above, in the prior art no matter whether the first functional unit 12 executes a real calculation in either the level E1 or the level E2, one stalled instruction cycle of the calculation task will possibly occur. If a calculation of r5=abs ((r1+r2)*r3) is considered where abs is absolute value and is executed by ALU. No matter whether the first functional unit 12 executes a real calculation task in either the level E1 or level E2, one stalled instruction cycle of the calculation task will possibly occur. In more complicated calculations, a condition of a stall will occur frequently.
In the prior art, a stall will cause a huge damage to a processing efficiency of pipeline architecture. When a stall occurs, a timing interval between two adjacent calculation tasks of writing back data is prolonged (as a cycle of time pulse controlling a register file). Under a fixed cycle of time pulse, the stall causes an increasing of the time period of the calculation task over that of the cycle of time pulse and pipeline architecture will delay one cycle of time pulse for the whole calculation resulting in a decline of processing efficiency. Therefore, a condition of Very-Long Instruction Word (VLIW) becomes obvious because calculation tasks under the VLIW condition are executed as a unit of a plurality cycle of time pulse. If a calculation task is influenced because of the stall, the whole execution package of calculation tasks will be delayed resulting in more damage to processing efficiency of pipeline architecture.