A microprocessor is a circuit that combines the instruction-handling, arithmetic, and logical operations of a computer on a single chip. A digital signal processor (DSP) is a microprocessor optimized to handle large volumes of data efficiently. Such processors are central to the operation of many of today's electronic products, such as high-speed modems, high-density disk drives, digital cellular phones, and complex automotive systems, and will enable a wide variety of other digital systems in the future. The demands placed upon DSPs in these environments continue to grow as consumers seek increased performance from their digital products.
Designers have succeeded in increasing the performance of DSPs generally by increasing clock frequencies, by removing architectural bottlenecks in DSP circuit design, by incorporating multiple execution units on a single processor circuit, and by developing optimizing compilers that schedule operations to be executed by the processor in an efficient manner. As further increases in clock frequency become more difficult to achieve, designers have embraced the multiple execution unit processor as a means of achieving enhanced DSP performance. For example, FIG. 1 shows a block diagram of a DSP execution unit and register structure having eight execution units, L1, S1, M1, D1, L2, S2, M2, and D2. These execution units operate in parallel to perform multiple operations, such as addition, multiplication, addressing, logic functions, and data storage and retrieval, simultaneously.
Theoretically, the performance of a multiple execution unit processor is proportional to the number of execution units available. However, utilization of this performance advantage depends on the efficient scheduling of operations so that most of the execution units have a task to perform each clock cycle. Efficient scheduling is particularly important for looped instructions, since the processor will typically spend the majority of its time in loop execution.
One effective way in which looped instructions can be arranged to take advantage of multiple execution units is with a software pipelined loop. In a conventional scalar loop, all instructions execute for a single iteration before any instructions execute for following iterations. In a software pipelined loop, the order of operations is rescheduled such that one or more iterations of the original loop begin execution before the preceding iteration has finished.
Referring to FIG. 2a, a simple loop containing 7 iterations of the operations A, B, and C is shown. FIG. 2b depicts an alternative execution schedule for the loop of FIG. 2a, where a new iteration of the original loop is begun each clock cycle. For clock cycles I.sub.3 -I.sub.7, the same instruction (A.sub.n,B.sub.n-1,C.sub.n-2) is executed each clock cycle in this schedule; if multiple execution units are available to execute these operations in parallel, the code can be restructured to perform this repeated instruction in a loop. The repeating pattern of A,B,C (along with loop control operations) thus forms the loop kernel of a new, software pipelined loop that executes the instructions at clock cycles I.sub.3 -I.sub.7 in 5 loops. FIG. 2c depicts such a loop. The instructions executed at clock cycles I.sub.1 and I.sub.2 of FIG. 2b must still be executed first in order to properly "fill" the software pipelined loop; these instructions are referred to as the loop prolog. Likewise, the instructions executed at clock cycles I.sub.8 and I.sub.9 of FIG. 2b must still be executed in order to properly "drain" the software pipeline; these instructions are referred to as the loop epilog. Note that in many situations the loop epilog may be deleted through a technique known as speculative execution.
The simple example of FIGS. 2a-2c illustrates the basic principles of software pipelining, but other considerations such as dependencies and conflicts may constrain a particular scheduling solution. For an explanation of software pipelining in more detail, see Vicki H. Allen, Software Pipelining, 27 ACM Computing Surveys 367 (1995).