A compiler is a computer program that reads source files of another program to produce a binary file, which is required for execution by a computer. The source files describe the program using a computer language such as C, C++, COBOL or the like. The binary file produced by the compiler contains a series of binary machine instructions for a particular type of computer. Moreover, the compiler generates diagnostic messages when it detects errors in the source files. A compiler is distinguished from an assembler by the fact that each input statement does not, in general, correspond to a single machine instruction or fixed sequence of instructions. A compiler may support such features as automatic allocation of variables, arbitrary arithmetic expressions, control structures such as FOR and WHILE loops, variable scope, input/output operations, higher-order functions and portability of source code.
A source file can contain compiler directives that cause other source files to be included. A compilation unit is a single source program file given to the compiler, plus all the source program files included directly or indirectly by that file. A binary file can contain machine instructions from one or more compilation units, and a compilation unit can come from multiple source files. Sometimes the machine instructions of a single compilation unit are saved in a separate binary file, called an object file. Object files are then combined by a linker to create a final binary file.
A compiler that has been programmed with the use of objects must relate its objects (representing a program being compiled) to locations within the source files. The parsing phase of compilation creates objects representing program elements, such as functions, statements and expressions. The code generation phase of compilation involves generating machine instruction objects for the program element objects. Locations in the source files must be captured and maintained for the program element objects and then passed on to the respective machine instruction objects. A source location usually consists of a source file name and a line number within the source file.
The compiler uses the source locations of its objects in at least two cases. First, the compiler shows a source location when issuing a diagnostic message to inform the compiler's user of an error location. Secondly, the compiler places a table in the binary file with the machine instructions mapping the instructions to their corresponding source locations. This table is used for debugging when the machine instructions are loaded from the binary file into a computer system's memory and executed. If processing of the machine instructions is interrupted, a debugger or other diagnostic software can use the table to find the source location that corresponds with the current point of execution of machine instructions. If call instructions are used, the debugger or other diagnostic software can also use the table to find the source locations of the series of calls that arrived at the interrupted machine instruction. The list of source locations, starting with the point of interruption followed by the source locations of the calls that arrived there in order of most recent call to first call, is referred to as a call history.
One type of prior art compiler processed a source file in a single pass by reading the source file and generating machine instructions at the same time. Typically included in this type of one-pass compiler are running variables holding the current source file name and line number, which are used to correlate between the original source file and the generated binary code. Such a straightforward correlation is adequate for a one-pass compiler but is too simplistic to cover most compiler requirements of today.
Many compilers today scan source files to create objects representing program elements. The compiler then makes multiple passes over the objects in order to verify correctness, find optimization opportunities and generate machine instructions. Some compilers then make one or more additional passes over the machine instructions to find still more optimization opportunities. Optimizations cause objects to sometimes be moved in their respective order and sometimes replaced by other new objects. These relocations and replacements happen to both program element objects and machine instruction objects. The prior art for relating machine instructions to source locations in typical multipass compilers uses two instance variables in each object. The first variable points at a source file name using either a memory address or an index into a table of names. The second variable holds a line number within the named file. These two variables must be set as objects are created while scanning source files, and then they must be copied to other objects created in later passes, such as for optimization and code generation. In some compilers the two variables are combined into one variable that holds an index into a list of ranges of line numbers with source files.
A common and important optimization called "inlining" causes a major problem with the way source locations are managed by the prior art compilers. The term "inlining" as used herein shall mean the replacing of a function with an instance of the function's body. When a compiled program is interrupted, the locations of call instructions are used to look up source locations to build a complete call history. Inlining causes call instructions to be removed. A compiler copies the objects representing the body of an inlined function in place of a call on the function. Using the method of the prior art, each copied object can be related to only one source location. So the compiler can preserve either the location of the call or the location within the inlined function, but not both. The result is that a call history reported by a debugger or other diagnostic software is incomplete. Gaps in the call history as a result of inlining cause confusion and create misleading results.