Computers are being used today to perform a wide variety of tasks. Many different areas of business, industry, government, education, entertainment, and most recently, the home, are tapping into the enormous and rapidly growing list of applications developed for today's increasingly powerful computer devices. Computers have also become a key technology for communicating ideas, data, and trends between and among business professionals. These devices have become so useful and ubiquitous, it would be hard to imagine today's society functioning without them.
Computers operate by executing programs, or a series of instructions stored in its memory. These programs, and their series of instructions, are collectively referred to as software. Software is key to utility of computers. Software is what makes the computer devices function and perform useful tasks. Good software makes for effective machines, while poor software makes for difficult to use, less effective machines. Thus, the utility of the computer device often hinges upon the utility of the software written for the device.
Software is written by professionals referred to as programmers or software engineers. As programs have become larger and more complex, the task of writing software has become correspondingly more difficult. As a result, programmers typically code in "high level languages" to improve productivity. The use of high level language makes the task of writing extremely long and complex programs more manageable. The completed program, however, must be translated into machine executable language in order to run on a computer. Programmers rely upon compilers to translate their program written in high level language into a program comprised of machine executable code, or, machine language.
Compiler efficiency and sophistication is directly related to the speed and reliability of the machine executable code. The process of translating the program written in high level language into a program written in machine language is referred to as compiling. The actual translation is performed by a software program referred to as a compiler. The compiler operates on the program written in high level language. The high level language program is referred to as source code. The compiler translates the source code into machine executable code. Ultimately, it is the machine executable code which will run on the computer. Thus the speed and reliability of the executable code depends upon the performance of the compiler. Where the compiler is inefficient, the size of the executable code will be larger than necessary. Other attributes, such as execution speed and reliability, may also be affected. Where the compiler lacks sophistication, many optimizations, or efficiency enhancing modifications, to the executable code will be lost. It is critical to the speed and efficiency of the program that the compiler thoroughly optimizes the executable code during the translation process.
The process of optimization, on a rudimentary level, often involves moving code from one location in the program to another location. Code is moved in order to enhance speed and efficiency. For example, the compiler may seek to place code in locations such that a particular calculation called for in the code can be performed once instead of perhaps hundreds, or even millions, of times. This could involve moving code for performing a calculation out of a program loop, as opposed to leaving it in where it must be performed each time the loop is executed.
In order to move code, the compiler must make sure the code relocation does not adversely affect other units of code which refer to or rely upon the code in some way. This is especially true for code comprising memory references. These memory references point to objects of code, i.e., data, that is in memory (e.g., load data, store data, and the like). The compiler must make sure not to move certain units of code, especially memory references, past other units of code which refer to the same memory locations. For example, a compiler would like to move a load of a variable located inside a loop, outside the loop. This would allow for the program to execute the load once, as opposed to as many times as the program executes the loop. There may be other stores inside the loop, so the compiler must insure any other store inside the loop does not go to the same memory location. Should a store go to the same memory location and the compiler move the load of the variable up in front of the store, the program would be erroneously loading an earlier version of whatever value was there instead of the version the program was looking for.
Referring now to FIGS. 1A, 1B, and 1C, an example of the above described optimization problem is shown. The figures collectively describe the example of moving a "load value" out of a loop. In FIG. 1A, the load value is inside of the code loop, line 101. To execute the code, the computer must execute the load instruction each time it processes through the loop. Depending upon the particular program implementation, the computer may execute the loop anywhere from perhaps a few dozen to millions of times. If the compiler could structure the program such that the load instruction is executed just once, outside of the loop, the loop will execute correspondingly faster. In FIG. 1B, the load instruction is moved outside the code loop, line 111. In this position, the load instruction is executed only once, before the loop. This saves the execution time of the load instruction each time the loop is executed.
The problem is that at program compile time, the typical compiler cannot tell whether Addr1 and Addr2 point to the same location in memory at program run time. If Addr1 and Addr2 do point to the same location, the program will be erroneously picking up an earlier version of the data pointed to by Addr2 111, in the manner described above. This could lead to incorrect program behavior.
In FIG. 1C, another version of this problem is illustrated. Here, V is a global variable. P is a pointer variable that contains an address. *P is the data contained in the memory location pointed to by P. Line 122 loads the value of P into register y. Line 107 loads into a register X, the data pointed to by P. Line 124 actually uses X, the data. As described above, the problem here is that at run time, the typical compiler cannot tell whether *P and V are the same memory, that is, whether P contains the address of V. The compiler does not really know what is in P. A good compiler may do some analysis, however, to figure out what the possibilities are. The compiler will scan the program to determine if V's address was ever taken (i.e. whether the program ever loaded the address of V). The compiler cannot generally determine whether the address contained in P, in particular, points to V or ever did point to V. The compiler can, however, determine whether V's address was ever taken. If V's address was ever taken, there's some potential V's address ended up in P. This is what the compiler must guard against while performing optimizations. If there is a possibility of P pointing to V, the load instruction 123 cannot be moved outside of the loop without possibly causing an error. If V's address cannot be in P, the compiler can proceed with the optimization and safely move the load instruction outside the loop.
This is the manner typical compilers deal with this problem. They scan all of the variables in the program to determine those that have ever had their addresses taken. They are candidates to have their addresses in pointers like P. None of these candidates are then optimized. Although this approach solves the problem of introducing errors into the executable code, it is extremely overly conservative. In FORTRAN in particular and in C to a lesser extent, when programs pass arguments (e.g., X) to subroutines, the compiler takes the address of X and places it into a register. That address is passed to the subroutine. This is often done in such a context that the program never uses the argument address in a way that it would be a problem with the above described optimization process. The simple minded approach of the typical compiler, (i.e. just scanning through the code and looking for variables that have their addresses taken), results in many lost optimization opportunities. The typical compiler will conclude that all of these variables that have their addresses taken can be pointed to by pointer variables. None of these variables will be optimized. This general problem of determining whether two memory references can lead to the same address or same memory location is called disambiguation.
A second, even more serious, aspect to the above described problem is where V is a global variable declared in such a way that it could come from another module or another file, or be otherwise externally visible. The compiler must always assume, in such instances, that V's address is taken. This is because the other file or module can take the same address. Therefore, the compiler must assume all such global data is always address-taken. Consequently, the disambiguation of memory references required by the optimization process is disabled when variables have had their addresses taken, and where variables are global or otherwise externally visible.
Thus, what is desired is a method to fully enable the disambiguation of memory references as required by the optimization process. The method should provide for the optimization of all variables where address-taken conditions would not induce errors. What is desired an apparatus and method for compiler identification of address data of variables such that those variables which cannot be optimized without inducing errors are particularly identified. The present invention provides a solution to the problems described above.