The present invention relates generally to compilers for use in processing source code in digital signal processors, microprocessors, computer systems and other processing systems and devices, and more particularly to optimization techniques to exploit zero-overhead loop mechanisms.
FIG. 1 is a simplified block diagram illustrating the operation of a conventional processing system 10. In the system 10, a program 12 provides source code as an input to a preprocessor 14. The preprocessor 14 performs tasks such as processing directives in the source code that cause the inclusion of files, and substituting expressions for defined constants or macros. The source code generally includes instructions configured in accordance with high level language titles, such as those associated with the C programming language. A compiler 15 receives the output of the preprocessor 14, and uses a set of optimization rules 16 to generate, from the preprocessed source code of program 102, corresponding object code/executable code which may be executed by a processor 18. Other implementations of the system 10 may combine the preprocessor 14 into the compiler 15. Conventional operations performed by the preprocessor 14 and compiler 15 are described in, for example, A. Aho et al., Compilers: Principles, Techniques and Tools, Addison-Wesley, 1988, which is incorporated by reference herein. Processor 18 may be a pipelined processor, or any other suitable processor.
In order to improve the performance of the system 10 in executing the program 12, various optimization techniques may be used. For many applications, a large percentage of the execution time is spent in the innermost loops of a program. The execution of these loops incur significant overhead, which is due to the execution of increment and branch instructions to initiate a new iteration of a loop. A number of hardware and software techniques have been used to minimize the loop overhead. Commonly used hardware techniques include, for example, hardware branch prediction, speculative execution, and minimizing branch latencies. Software techniques which may be implemented in an optimizing compiler include, for example, compile-time branch prediction, loop strength reduction, loop induction variable elimination and loop unrolling.
A conventional optimizing compiler is illustrated in greater detail in conjunction with FIGS. 2, 3 and 4. FIG. 2 shows the phases in a conventional optimizing compiler, which may be compiler 15 in the system 10 of FIG. 1. The compiler 15 includes a scanner 20, a parser 22, a code generator 24, an optimizer 26 and an assembly code generator 28. FIG. 3 shows conventional optimizations that may be applied by the optimizer 26 in the optimizing compiler 15. These optimizations include branch optimization 30, common subexpression elimination 32, constant propagation 34, loop optimizations 35, function inlining 36, and instruction scheduling 38. These optimizations may be repeated multiple times as required when compiling a function in a given program. FIG. 4 shows conventional loop optimizations that may be utilized in the optimizer as illustrated in FIG. 4. These loop optimizations include loop code motion 40, strength reduction 42, induction variable elimination 44, loop unrolling 46 and software pipelining 48. Like the optimizations of FIG. 3, the loop optimizations may be repeated for one or more additional loops as required. Details regarding these and other optimization techniques may be found in the above-cited Aho et al. reference.
Many code improving transformations and architectural features improve execution times at the expense of substantial code growth and more power consumption. For instance, the above-noted loop unrolling is a popular technique to decrease loop overhead. Yet, this approach often requires a significant increase in code size. DSP processors are typically used for applications in embedded systems that have strict code size and power limitations. Space increasing transformations, such as loop unrolling, are often unacceptable for many DSP applications due to these limitations.
A zero overhead loop buffer (ZOLB) is an architectural feature that is commonly found on DSP processors. This type of buffer can be used to increase the speed of applications with no increase in code size and often with reduced power consumption. A ZOLB is simply a buffer that contains a limited number of instructions. There are mechanisms to specify the number of times that the instructions in the buffer should be executed. Due to addressing complications, transfers of control instructions are not typically allowed in such buffers. Thus, a compiler or assembly writer attempts to execute many of the innermost loops of programs from this buffer. Unlike compiler techniques such as loop unrolling, a loop buffer can be used to efficiently reduce loop overhead without the penalty of increasing code size. This buffer can also be viewed as a compiler-managed cache that contains a sequence of instructions that will be executed a specified number of times. In addition, a ZOLB also requires very little power and space, which are both important considerations for most DSP applications.
Various techniques have been developed based on ZOLBs and other similar types of zero overhead loop mechanisms (ZOLMs). Some of these techniques are described in, e.g., P. Lapsley, J. Bier, A. Shoham and E. Lee, xe2x80x9cDSP Processor Fundamentals - Architecture and Features,xe2x80x9d IEEE Press, 1996. Another known technique, a repeat-bit based system and method for executing zero overhead loops that does not require a repeat end register or a dedicated comparator, is described in detail in U.S. Pat. No. 5,727,194 issued to S. Avadhani and K. Nitta.
Techniques for generating code to take advantage of ZOLMs are described in U.S. Pat. application Ser. No. 09/200,580 filed Nov. 27, 1998 in the name of inventors Vincent Cao et al. and entitled xe2x80x9cCompiler Optimization Techniques For Exploiting Zero Overhead Loop Mechanisms.xe2x80x9d For example, this application discloses code generation strategies for transforming loops so that the loops can exploit a ZOLB. Although the techniques described in this application provide significant improvements over prior techniques, further improvements are needed, particularly with respect to application of high performance compiler optimizations to exploitation of ZOLBs and other ZOLMs.
The invention discloses compiler optimization techniques designed to better exploit zero overhead loop buffers (ZOLBs) or other zero overhead loop mechanisms (ZOLMs) in a DSP, microprocessor or other processing device or system. In an illustrative embodiment, a compiler generates a first set of code, e.g., a set of assembly code from a corresponding set of source code, and then applies optimizations to the first set of code so as to generate a second set of code, e.g., an improved assembly code file, configured to operate efficiently with the ZOLB. The optimizations are designed to increase the number of loops of the first set of code that can be accommodated in the ZOLB, to further reduce the overhead of the loops placed in the ZOLB, and to eliminate redundant loading of the ZOLB. Optimizations for increasing the number of loops that can be accommodated in the ZOLB include, e.g., conditional instructions, loop splitting and function inlining. Optimizations for further reductions in loop overhead include, e.g., loop collapsing and loop interchange. Data flow analysis and loop peeling may be used to avoid redundant loading of the ZOLB.
The invention provides optimization techniques which ensure that output code generated by a compiler is configured to exploit ZOLMs of a given processor. The invention thus provides improved code execution time relative to conventional techniques which are generally unable to make use of ZOLMs. The invention can be applied to a variety of different types of DSPs, microprocessors, computer systems and other types of processing systems and devices. These and other features and advantages of the present invention will become more apparent from the accompanying drawings and the following detailed description.