Whole program analysis enables an aggressive form of optimization that is applied on a full program basis. The goal of whole program analysis is to analyze substantially the entire program during the compilation phase to obtain the most effective optimization possible. One difficulty with whole program analysis is that the compiler used to compile the program normally does not have access to the entire program and, therefore, all of the information it needs to optimize the program. Instead, the compiler typically only “sees” the program files that are provided to the compiler by the programmer (i.e., user). Accordingly, the compiler normally cannot take into account any information contained in, for example, previously compiled object files of a library or a separate load module. Without having access to this information, the compiler cannot identify all the different relationships between the various portions of the program, and therefore cannot perform the most efficient optimization. Hence, optimization can only be provided in relation to the information gleaned from the source files provided to the compiler for compilation as opposed to the whole program.
One specific type of optimization that can be performed is short data optimization. As is known in the art, the compiler designates the global program data as either short data or long data. Short data have shorter addressing sequences and therefore can be accessed by a processor more directly during program execution. Long data, on the other hand, can only be accessed by first referring to a data linkage table stored in the short data area to obtain the address of the long data within the long data area. Accordingly, accessing long data involves an extra indirection that slows program execution. In view of this fact, it is desirable to designate as much data as possible as short data to increase execution speed.
Although greater performance can be obtained by placing more data in the short data area, there are limitations as to how much data can be designated as short data that are imposed by any given system architecture. In particular, the data references are encoded in program instructions using offsets. Because there is a limited number of bits that may be used to encode the offsets, if information regarding the location of a given piece of data requires more bits than are available for a single instruction (e.g., 32 bit instruction), multiple instructions are required to refer to the given data, thereby reducing program performance by requiring execution of more instructions. Therefore, to avoid this situation, an indirection is used to identify the location of the sought data. The size limitations of the short data area translate into a limited amount of data that may be designated as short data. By way of example, only 4 megabytes (MB) of data may be allocated to the short data area without overflowing the short data area and generating a link-time error.
In conventional systems, short data area overflow is normally avoided by arbitrarily designating all data having size below a given threshold as short data. For example, any piece of data equal to or less than 8 bytes may be designated as short data and allocated to the short data area. Although this approach typically is effective in avoiding exceeding the constraints of the short data area, it often results in underutilization of the available short data area, i.e., results in less data being designated as short data than is possible. By way of example, this approach may only result in approximately 1 MB of short data. This, in turn, results in more data being designated as long data and therefore slows execution of the compiled program. To more effectively utilize the short data area, the whole program must be considered. In particular, the sizes of each piece of data of the program, as well as the size of any tables to be stored within the short data area, must be considered.
In recognition of the limited amount of optimization that is obtainable using conventional techniques, several solutions have been proposed. In one such solution, aggressive assumptions are made as to the nature of the program that is to be compiled and are applied by the compiler during the compilation process. The problem with this approach, however, is that it is only as accurate as the assumptions that are made. Accordingly, if the assumptions are wrong, the program may not be optimized to its greatest extent or, in some cases, compilation errors will be encountered.
In another solution, attempts are made to approximate whole program analysis by creating a database for various libraries that contain object files. The compiler is configured to query the database for information about the object files and, presumably, uses this information to optimize the program. This approach fails to provide true whole program analysis, however, in that the database is built when the various program libraries are built and therefore can only provide information as to known system libraries. Accordingly, the approach is ineffective for gathering information contained in user provided libraries. Moreover, problems exist with regard to how to build the database and keep it up to date.
With particular regard to short data optimization, trial and error may be used by arbitrarily designating all data of a given size as short data. However, this approach is inefficient in that several attempts at compilation and linking may be necessary to fully optimize the available short data area without exceeding the short data area limitations.