Presently, software development focuses on two separate tasks in the process of generating a computer program: the compiling of the program into an executable object code file to be run on a computer processing system and the debugging of that executable file as it is being executed by the computer processing system. In general, a separate version of a compiler is created for each programming language and each computer processing system. Similarly, a separate debugger is created to debug the executable object code on each computer processing system. As a result of the independent creation of present compilers and debuggers, most prior art software development systems are a collection of separate tools where each of the tools knows little or nothing about the other tools in the development system.
The design and construction of compilers is well known in the art, e.g., Aho, Sethi and Ullman, Compiler: Principles, Techniques and Tools, Addison-Wesley (1986); and Waite et al, Compiler Construction, Springer-Verlag, (1984). Compilers convert a given computer source language, such as FORTRAN, into code executable by a given computer processing system (i.e., the target machine). Compilation of a computer source language is accomplished through a series of transformations. First, the strings of symbols that comprise the source code are lexically analyzed to ascertain the atomic units or words for translation. Then, the string of symbols are syntactically analyzed for ascertaining the grammatical relations among the words. Typically, the output is expressed in the form of a parse tree which is transformed into an intermediate language representation of the source code. Most compilers do not generate a parse tree explicitly, but form the intermediate code as the syntactic analysis takes place. Optimization is then applied to the intermediate code, after which the target machine-executable or object code is generated. Examples of optimizing compilers for present high performance computer processing systems include the compilers for the Hitachi S-810 supercomputer (e.g., U.S. Pat. Nos. 4,773,007, 4,807,126, 4,821,181, 4,833,606, 4,843,545 and 4,853,872), the compilers for the Cray-1 supercomputer (e.g., Cray Research Publication number SR-0018) and the compilers for the IBM mainframe computers (e.g., U.S. Pat. Nos. 4,782,444, 4,791,558 and 4,802,091).
The design and construction of debuggers is also well known in the art. Debuggers assist programmers in creating executable code by identifying errors in the execution of the object code file and helping to trace the source of the error as manifested in the executable object code file back to the source code program. Most debuggers are particular to a computer processing system because of the inherent relationship between the hardware features of a computer processing system and the execution of object code files on that computer processing system. While the debugging process may be relatively straightforward for a given programming language executing on a given computer processing system, the challenge for present debuggers is to provide effective identification of errors in executable code produced by an optimizing compiler that is, for example, part of a software development system for a high-performance computer processing system. The difficulties of debugging executable code produced by an optimizing compiler are further compounded when the compiler produces code capable of executing on more than one processor in a multiprocessor system.
Optimizations are frequently performed for programs to be executed on a high-performance computer processing system, including multiprocessor systems. The objectives of the optimizing portion of a compilation system are to (a) increase the execution speed of the program, (b) reduce the size of the executable code, and (c) minimize processing costs through efficient resource allocation. Optimizations that are frequently employed in optimizing compilers can be divided into two classes, which are commonly known as "local" and "global" optimizations. Local optimizations are those that are based on an analysis of a relatively small region of the program, such as a "basic block", or perhaps only two adjacent machine instructions. Global optimizations are those that are based on an analysis of more than a single basic block. Examples are "code motion" (moving code out of loops) and "global common subexpression elimination." Although many types of local and global optimizations are presently used in compilation systems, all of these optimization affect the execution of the program in ways that are not obvious from the organization and structure of the source code program and, consequently, increase the problems associated with effectively debugging the program. These problems are further compounded in multiprocessor systems where more than one processor may be executing portions of the executable code file for a given program.
Generally, compilers for different programming languages use different intermediate representations during the compilation process, while debuggers use yet another intermediate representation for the debugging process. Because the debugger has no knowledge of the intermediate representations used by the various compilers, the debugger has no way of relating the optimized executable code back to the original source code and, as a result, the debugging of optimized code is very difficult. Also, for compilers that use different intermediate representations, inter-language inlining is impossible. Because most prior art assemblers do not use a common intermediate representation, assembly language programs must use different debuggers from those used for high level language programs. In addition, little optimization of assembly language programs have been attempted in the past. This is partly because of an assumption that an assembly language program is written exactly the way the programmer wanted it to be written and partly because of the cost of developing an optimizer specifically for assembly language programs.
More recent software development systems such as the Ada Programming Support Environment (APSE) for the Ada programming language use a common intermediate representation (CIR) shared by many of the components in the compilation system in an effort to solve some of the problems mentioned above. Unfortunately, the common intermediate representation, known as DIANA, is specific only to the Ada programming language. Thus, mixing of languages at the intermediate level in the compilation system is impossible. Additionally, DIANA is not in itself capable of representing the transformations performed by optimizers on the source program. For this reason, debugging an optimized program in the Ada environment is difficult. For example, the Ada debugger does not know where to find the value of a variable if the compiler decides to keep that variable in a register, rather than in a memory location. Also, DIANA does not represent machine level instructions, so use of DIANA for assisting in the optimization of assembly language programs is impossible.
Another recent compiler system (U.S. Pat. No. 4,667,290) defines multiple front ends for different programming languages that produce the same common intermediate representation. While this approach solves some of the problems presented by earlier software development systems, several problems still remain. First, the sequential nature of the CIR produced by this prior art software development system fails to represent transformations performed by an optimizer on the source program. Second, the debugger is not closely integrated with the development system. Because of this, the debugger cannot know the kinds of transformations performed by the compiler, hence the debugging of optimized code is difficult. Third, because the assembler in this prior art software development system produces relocatable object code rather than some form of a common intermediate representation, the compiler cannot be used to optimize the assembly language program. Thus, only primitive optimizations such as peephole optimizations can be performed on a machine dependent level, that is to say on the level of code that can only run on a specific target machine. Fourth, because the debugger in this prior art software development system is designed to operate on the CIR generated by the compiler, it is unsuitable for the source-level debugging of assembly language programs. In other prior art systems, this problem is solved by providing primitive debuggers for assembly language programs; however, this requires users to learn two different debuggers, one for high level language debugging and another for assembly language debugging.
Even if a unified and integrated intermediate representation for compilers, assemblers and debuggers were available, the present methods and systems do not represent the information in a form that is most suited for optimization. The various types of common intermediate representations utilized in the prior art software development systems are essentially simple linear representations of information concerning only the actual programming statements in the source code. The common intermediate representations of prior art software development system have no mechanism for preserving important context and optimization information about the compiled program. Most importantly, the actual structure of present common intermediate representations does not allow for efficient optimizations because the structure of the representation does not expose many of the relationships among the components of the source code program.
Although present software development systems can produce efficient and effective executable object code files for a given source code program, there is no completely integrated software development system that allows for common representation of all types of information about the source code and optimized object code program. Consequently, there is a need for an integrated software development system that allows for a common intermediate representation to be effectively utilized by all components of the software development system and that is capable of representing additional information about the program for purposes of optimization and debugging, particularly in a high performance multiprocessor environment. In addition, there is a continuing need to provide better methods and structures for representing this common intermediate representation that are more suitable for performing a variety of optimization techniques during software development.