A microprocessor is a circuit that combines the instruction-handling, arithmetic, and logical operations of a computer on a single chip. A digital signal processor (DSP) is a microprocessor optimized to handle large volumes of data efficiently. Such processors are central to the operation of many of today's electronic products, such as high-speed modems, high-density disk drives, digital cellular phones, and complex automotive systems, and will enable a wide variety of other digital systems in the future. The demands placed upon DSPs in these environments continue to grow as consumers seek increased performance from their digital products.
Designers have succeeded in increasing the performance of DSPs generally by increasing clock frequencies, by removing architectural bottlenecks in DSP circuit design, by incorporating multiple execution units on a single processor circuit, and by developing optimizing compilers that schedule operations to be executed by the processor in an efficient manner. As further increases in clock frequency become more difficult to achieve, designers have embraced the multiple execution unit processor as a means of achieving enhanced DSP performance. For example, FIG. 1 shows a block diagram of a DSP having eight execution units, L1, S1, M1, D1, L2, S2, M2, and D2. These execution units operate in parallel to perform multiple operations, such as addition, multiplication, addressing, logic functions, and data storage and retrieval, simultaneously.
Theoretically, the performance of a multiple execution unit processor is proportional to the number of execution units available. However, utilization of this performance advantage depends on the efficient scheduling of operations so that most of the execution units have a task to perform each clock cycle. Efficient scheduling is particularly important for looped instructions, since in a typical runtime application the processor will spend the majority of its time in loop execution.
One effective way in which looped instructions can be arranged to take advantage of multiple execution units is with a software pipelined loop. In a conventional scalar loop, all instructions execute for a single iteration before any instructions execute for following iterations. In a software pipelined loop, the order of operations is rescheduled such that one or more iterations of the original loop begin execution before the preceding iteration has finished. Referring to FIG. 2a, a simple loop containing 7 iterations of the operations A, B, and C is shown. FIG. 2b depicts an alternative execution schedule for the loop of FIG. 2a, where a new iteration of the original loop is begun each clock cycle. For clock cycles I.sub.3 -I.sub.7, the same instruction (A.sub.n,B.sub.n-1,C.sub.n-2) is executed each clock cycle in this schedule; if multiple execution units are available to execute these operations in parallel, the code can be restructured to perform this repeated instruction in a loop. The repeating pattern of A,B,C (along with loop control operations) thus forms the loop kernel of a new, software pipelined loop that executes the instructions at clock cycles I.sub.3 -I.sub.7 in 5 loops. FIG. 2c depicts such a loop. The instructions executed at clock cycles I.sub.1 and I.sub.2 of FIG. 2b must still be executed first in order to properly "fill" the software pipelined loop; these instructions are referred to as the loop prolog. Likewise, the instructions executed at clock cycles I.sub.8 and I.sub.9 of FIG. 2b must still be executed in order to properly "drain" the software pipeline; these instructions are referred to as the loop epilog (note that in many situations the loop epilog may be deleted through a technique known as speculative execution).
The simple example of FIGS. 2a-2c illustrates the basic principles of software pipelining, but other considerations such as dependencies and conflicts may constrain a particular scheduling solution. For an explanation of software pipelining in more detail, see Vicki H. Allan, Software Pipelining, 27 ACM Computing Surveys 367 (1995).
Another technique commonly used with multiple execution unit processors and software pipelined loops is multiple assignment of registers. Referring again to FIG. 1, registers A0-A15 and B0-B15 are connected to execution units L1, S1, M1, D1, L2, S2, M2, and D2. These registers are used to store, e.g., operands, results of operations, counter values, and conditional values during execution. Typically, single assignment of such registers is preferred; in single assignment, once an execution unit utilizes a register in an operation, that register may not be re-used until the original operation completes. With multiple assignment of registers, however, registers may be re-used according to known hardware delays. For example, a first operation may begin execution on unit D1 to load a value into register A4 from memory, an operation that requires five clock cycles to complete (if an operation requires more than one clock cycle to complete, the additional clock cycles are typically referred to as delay slots). With single assignment, A4 could not be used for any other purpose during those five clock cycles, although the value loaded from memory will not actually appear in register A4 until the end of the fifth cycle. With multiple assignment, A4 could be used for other purposes during the delay slots of the load from memory operation, as long as the register is freed prior to completion of the load operation.
Multiple assignment of registers is often desirable in pipelined loops, where the overlapping of many multiple-clock cycle instructions such as loads, multiplies, and branches is required. Single assignment of registers under these conditions may require that parallelism be decreased, as some schedules will be impossible to implement because of the large number of registers required by single assignment. Multiple assignment will often result in reduced register pressure and increased parallelism under these conditions. The main drawback of multiple assignment is that it requires that the instructions execute in an expected order--if an unexpected order of execution is encountered, there is a high possibility that register data will be corrupted.