1. Field of the Invention
This invention relates to the field of Optimizing Compilers for computer systems; specifically, it relates to a method for collapsing the prolog and epilog of software pipelined loops.
2. Description of the Related Art
Software pipelining is key to achieving good performance software architectures that support Instruction Level Parallelism (ILP architectures), which are generally architectures that are capable of issuing parallel instructions. Instructions may be considered to be xe2x80x9cparallel operationsxe2x80x9d if their operations overlap in time, such as when an instruction has delay slots.
Software pipelining improves loop performance by exposing and exploiting instruction-level parallelism between consecutive loop iterations. For example, a loop body may consist of three instructions (ins1, ins2, and ins3), a decrement, and a conditional branch back to the beginning. In the absence of software pipelining and assuming dependence constraints are met, a possible xe2x80x9cschedulexe2x80x9d for this code on a VLIW processor might be as follows:
Software pipelining overlaps multiple consecutive iterations of the loop to improve throughput, and therefore performance. For instance, assuming that all dependence constraints are met, a possible pipelined version of the loop in the example above might look as follows:
In the pipelined code above, the 3 cycle loop becomes a 1 cycle loop by parallelizing 3 consecutive iterations of the loop. The kernel of the loop acts as a pipeline, processing one xe2x80x9cstagexe2x80x9d of each of the iterations in parallel. The pipeline is primed and drained through the xe2x80x9cprologxe2x80x9d and xe2x80x9cepilogxe2x80x9d code that surrounds the kernel.
In general, all prolog, epilog, and kernel stages consist of II cycles, where II is the xe2x80x9cinitiation interval.xe2x80x9d In the example above, II=1. In other cases, II might be greater than 1.
In some cases, each stage may consist of multiple cycles. For instance, a kernel may be more than one cycle in length. For example, this may be due to hardware restrictions, such as the need to perform three multiplication operations when there are only two multipliers available. To accomplish this, two multiplications would be performed in parallel in one cycle of the kernel, and the third multiplication would be performed in the other cycle.
The kernel size may also be increased because of loop carried data dependencies in the loop being software pipelined. Future loop iterations cannot start until the current iteration completes the computation of a result required by the future iteration.
Therefore, a need has arisen for a system and method for collapsing the prolog and the epilog of software pipelined loops.
In accordance with one embodiment of the present invention, software-based techniques which reduce code expansion by rolling some or all of the prolog and/or epilog back into kernel are applied. This may be accomplished via xe2x80x9cprolog collapsingxe2x80x9d and xe2x80x9cepilog collapsing.xe2x80x9d
According to one embodiment of the present invention, a method for reducing the code size of a software pipelined loop having a prolog, a kernel, and an epilog is disclosed. This method involves the collapsing of an epilog and/or a prolog. Stages are processed inside-outxe2x80x94that is, starting with the stage closest to the kernel, and working out from the kernel. A stage can be collapsed (i.e., rolled into the kernel) if instructions that are present in either a previous stage, or in the kernel, can be either speculated or predicated. If a stage is encountered that cannot be completely collapsed, the process is complete.
According to another embodiment of the present invention, a method for reducing a code size of a software pipelined loop having a kernel, an epilog, and optionally, a prolog includes the following steps. The stages of the epilog may be evaluated inside-out. Instructions that are present in a reference stage, which may be the kernel or a previously evaluated stage of the epilog, but not in the selected stage, are evaluated. If the identified instructions can be speculated, they are noted as capable of being speculated. If the instructions are not capable of being speculated, it is determined if the instructions can be predicated. If the instructions can be predicated, they are marked as capable of being predicated. If instructions cannot be speculated or predicated, the stage cannot be collapsed. The method is repeated for all stages of the epilog until the epilog cannot be collapsed.
According to another embodiment of the present invention, a method for reducing a code size of a software pipelined loop having a prolog, a kernel, and, optionally, an epilog ,includes the following steps. The stages of the prolog may be evaluated inside-out. Instructions that are present in a reference stage, which may be the kernel or a previously evaluated stage of the prolog, but not in the selected stage, are evaluated. If the identified instructions can be speculated, they are noted as capable of being speculated. If the instructions are not capable of being speculated, it is determined if the instructions can be predicated. If the instructions can be predicated, they are marked as capable of being predicated. If instructions cannot be speculated or predicated, the stage cannot be collapsed. The method is repeated for all stages of the prolog until the prolog cannot be collapsed.
According to another embodiment of the present invention, a method for reducing a code size of a software pipelined loop having a prolog and a kernel, and, optionally, an epilog, the kernel having a plurality of cycles, includes the following steps. Stages may be processed from the inside-out. The innermost unprocessed cycle of a candidate stage is identified, and instructions that are present in a reference stage, which may be the kernel or a previously evaluated stage of the prolog, but not in the identified stage, are evaluated. If the identified instructions can be speculated, they are noted as capable of being speculated. If the instructions are not capable of being speculated, it is determined if the instructions can be predicated. If the instructions can be predicated, they are marked as capable of being predicated. If instructions cannot be speculated or predicated, the stage cannot be completely collapsed. The method is repeated for all cycles of all stages of the prolog until the prolog cannot be completely collapsed.
Consider the cycle on which the process got stuck. This becomes the current cycle. If this is not the innermost cycle of the current stage, it is determined whether a branch can be inserted. If so, the candidate stage is partially collapsed. If not, the next innermost cycle becomes the current cycle. The process is repeated until a branch can be inserted and a stage can be partially collapsed or an innermost cycle of a current stage is encountered.
A technical advantage of the present invention is that a method for collapsing the prolog and epilog of software pipelined loops is disclosed. Another technical advantage of the present invention is that code size is reduced. Another technical advantage of the present invention is that the method of the present invention makes code more efficient.