The present invention relates generally to the field of development systems for computers and, more particularly, to systems and methods for compiling source programs into object modules and linking those modules into programs executable by computers.
Before a digital computer may accomplish a desired task, it must receive an appropriate set of instructions. Executed by the computer's microprocessor, these instructions, collectively referred to as a "computer program," direct the operation of the computer. Expectedly, the computer must understand the instructions which it receives before it may undertake the specified activity.
Owing to their digital nature, computers essentially only understand "machine code," i.e., the low-level, minute instructions for performing specific tasks--the sequence of ones and zeros that are interpreted as specific instructions by the computer's microprocessor. Since machine language or machine code is the only language computers actually understand, all other programming languages represent ways of structuring human language so that humans can get computers to perform specific tasks.
While it is possible for humans to compose meaningful programs in machine code, practically all software development today employs one or more of the available programming languages. The most widely used programming languages are the "high-level" languages, such as C or Pascal. Most of the high-level languages currently used for program development exploit the concept of modularity whereby a commonly required set of operations can be encapsulated in a separately named subroutine, procedure, or function; these terms will be used interchangeably herein to represent any type of discrete code objects. Once coded, such subroutines can be reused by "calling" them from any point in the main program. Further, a subroutine may call a subsubroutine, and so on, so that in most cases an executing program is seldom a linear sequence of instructions.
In the C language, for example, a main() program is written which calls a sequence of functions, each of which can call functions, and so on. The essence of a function call is that the calling function (caller) passes relevant data as arguments (or parameters) to the target function (callee), transfers control to the memory section holding the function's executable code, returns the result of the call, and at the same time, stores sufficient information to ensure that subsequent execution resumes immediately after the point where the original function call was made. This approach allows developers to express procedural instructions in a style of writing which is easily read and understood by fellow programmers.
A program called a "compiler" translates these instructions into the requisite machine language. In the context of this translation, the program written in the high-level language is called the "source code" or source program. The ultimate output of the compiler is an "object module," which includes instructions for execution by a target processor. Although an object module includes code for instructing the operation of a computer, the object module itself is not in a form which may be directly executed by a computer. Instead, it must undergo a "linking" operation before the final executable program is created.
Linking may be thought of as the general process of combining or linking together one or more compiled object modules to create an executable program. This task usually falls to a program called a "linker." In typical operation, a linker receives, either from the user or from an integrated compiler, a list of object modules desired to be included in the link operation. The linker scans the object modules from the object and library files specified. After resolving interconnecting references as needed, the linker constructs an executable image by organizing the object code from the modules of the program in a format understood by the operating system program loader. The end result of linking is executable code (typically an .EXE file) which, after testing and quality assurance, is passed to the user with appropriate installation and usage instructions.
Ideally, when a compiler/linker development system translates a description of a program and maps it onto the underlying machine-level instruction set of a target processor, the resulting code should be at least as good as can be written by hand. In reality, code created by straightforward compilation and linking rarely achieves its goal. Instead, tradeoffs of slower performance and/or increased size of the executing application are often incurred. Thus while development systems simplify the task of creating meaningful programs, they rarely produce machine code which is not only the most efficient (smallest) in size but also executes the fastest.
One approach for improving the machine-level code generated for a program is to employ an execution profiler for analyzing the code, looking for any significant performance bottlenecks. Using a profiler, a developer can determine: how many times a particular section of code is executed (i.e., function is called, loop is iterated, and the like) and how long does it take to execute a particular passage of code. A passage executed a million times during operation of a program deserves more attention than one executed only once or twice. Improvements in the former typically have a profound effect on overall program performance, while improvements in the latter probably would yield only marginal improvements.
Profilers typically employ one of two approaches for analyzing a program. In the first approach, the profiler periodically interrupts the program's operation and checks the current location of the program counter. The results are scored using statistical methodology. Although the approach is not difficult to implement, the results are not particularly good. For instance, sections of code which may be of interest might be too small to be sampled accurately. Also, the approach cannot tell reliably how many times a passage was employed. The second approach is to start a system timer when the program reaches a passage of interest and stop the timer when the program leaves the passage. The approach is harder to implement but generally leads to more accurate analysis of the program.
Another avenue for improving performance of a program, one which is of particular interest to the present invention, is optimization of the ordering or layout of various procedures which comprise the executable program. Consider, for instance, programs running under Microsoft MS-DOS on an Intel 80.times.86-class computer. In the Intel architecture, a program code is organized into discrete blocks of executable code or "code segments." A call from a procedure (subroutine, function, or the like) in one code segment to another procedure residing in a different code segment ("far" call) is computationally more expensive than an intersegment or "near" call (since the CPU must load new segment descriptor information). A program with its procedures arranged within a few large segments requires fewer far calls than if the same program were implemented with many small segments and, thus, generally performs better. Thus, developers for MS-DOS programs may optimize performance by minimizing the number of segments in their programs, in order to minimize the number of intersegment or "far" calls.
Consider, in contrast, programs running under Microsoft Windows on an Intel 80.times.86-class computer. In a Windows environment, code segments can be dynamically loaded, moved, linked, and discarded. Developers tend to create programs comprising many smaller code segments. Such a program requires less memory than a corresponding version of that program with fewer large segments (as it is easier for the operating system to discard or swap out unused portions of the program). As code segments are needed, they are read from the storage device. When available memory gets low, on the other hand, code segments which are no longer needed are discarded to free up memory.
The foregoing approach is not without drawbacks, however. As more code segments are employed, the likelihood increases that a function or procedure call will be a far call. Moreover, the likelihood increases that a call will be to a procedure which is not yet in memory. Most users of Windows programs are familiar with "thrashing"--a condition in which a program repetitively accesses a storage device before completion of its current task (e.g., storing a spreadsheet file). Here, the current execution path requires execution of procedures strewn across various code segments, ones which cannot all be loaded into available memory. The program must waste time reading, discarding, and re-reading different code segments from the storage device before it executes all the procedures which complete the current task.
Given that programs typically comprise a plurality of interdependent procedures or functions, programs may be optimized by packing together or "clustering" groups of related functions. In this manner, interdependent procedures of a program are more readily available to one another (e.g., stored in the same memory page) at runtime. Although the task of organizing code segments for a program may be done by hand, such an approach is tedious and prone to error. Given the complexity of programs today, it is unlikely that a single developer will understand all of the parts of a large program well enough to do a good job. It is quite easy to optimize one part of a large program at the expense of poor performance in other parts. Accordingly, it is desirable to automate the task of optimizing the ordering of procedures.
A conventional system for optimizing code packing is "Segmentor.TM., " by Microquill. The Segmentor system employs an iterative process to determine the optimal organization of segments in a Windows program. The process entails creating a database of function calls, updating the database with runtime profile data, and analyzing the data with an optimizer. The database stores segmentation data about which functions calls what. Initially, the database stores static call information, based on source files and link data. Dynamic profile data may later be added to fine-tune the segmentation data. At that time, weights are assigned to function calls, based on data gathered during runtime. Using an optimizer, Segmentor.TM. searches for an optimal segment layout. The results are then provided to a development system (e.g., in the form of compiler/linker directives or "pragmas" statements). These statements control the placement of functions in segments by the compiler/linker.
Although a good first approach, the code packing optimization method employed by Segmentor is, using presently available equipment, particularly time consuming. Optimizing the code packing of a program of even a modest size may require several hours or more. Given the time pressures of modern-day software development, there is little room in the development schedule for use of such a time-intensive tool. Yet given the potential performance benefits of code packing, there remains great interest in developing optimization techniques which do not incur a substantial time penalty in the development cycle.