The present invention relates generally to the field of development systems for computers and, more particularly, to systems and methods for profiling software programs executable by computers.
Before a digital computer may accomplish a desired task, it must receive an appropriate set of instructions. Executed by the computer's microprocessor, these instructions, collectively referred to as a "computer program," direct the operation of the computer. Expectedly, the computer must understand the instructions which it receives before it may undertake the specified activity.
Owing to their digital nature, computers essentially only understand "machine code," i.e., the low-level, minute instructions for performing specific tasks--the sequence of ones and zeros that are interpreted as specific instructions by the computer's microprocessor. Since machine language or machine code is the only language computers actually understand, all other programming languages represent ways of structuring human language so that humans can get computers to perform specific tasks.
While it is possible for humans to compose meaningful programs in machine code, practically all software development today employs one or more of the available programming languages. The most widely used programming languages are the "high-level" languages, such as C or Pascal. Most of the high-level languages currently used for program development exploit the concept of modularity whereby a commonly required set of operations can be encapsulated in a separately named subroutine, procedure, or function; these terms will be used interchangeably herein to represent any type of discrete code objects. Once coded, such subroutines can be reused by "calling" them from any point in the main program. Further, a subroutine may call a subsubroutine, and so on, so that in most cases an executing program is seldom a linear sequence of instructions.
In the C language, for example, a main() program is written which calls a sequence of functions, each of which can call functions, and so on. The essence of a function call is that the calling function (caller) passes relevant data as arguments (or parameters) to the target function (callee), transfers control to the memory section holding the function's executable code, returns the result of the call, and at the same time, stores sufficient information to ensure that subsequent execution resumes immediately after the point where the original function call was made. This approach allows developers to express procedural instructions in a style of writing which is easily read and understood by fellow programmers.
A program called a "compiler" translates these instructions into the requisite machine language. In the context of this translation, the program written in the high-level language is called the "source code" or source program. The ultimate output of the compiler is an "object module," which includes instructions for execution by a target processor. Although an object module includes code for instructing the operation of a computer, the object module itself is not in a form which may be directly executed by a computer. Instead, it must undergo a "linking" operation before the final executable program is created.
Linking may be thought of as the general process of combining or linking together one or more compiled object modules to create an executable program. This task usually falls to a program called a "linker." In typical operation, a linker receives, either from the user or from an integrated compiler, a list of object modules desired to be included in the link operation. The linker scans the object modules from the object and library files specified. After resolving interconnecting references as needed, the linker constructs an executable image by organizing the object code from the modules of the program in a format understood by the operating system program loader. The end result of linking is executable code (typically an .EXE file) which, after testing and quality assurance, is passed to the user with appropriate installation and usage instructions.
Ideally, when a compiler/linker development system translates a description of a program and maps it onto the underlying machine-level instruction set of a target processor, the resulting code should be at least as good as can be written by hand. In reality, code created by straightforward compilation and linking rarely achieves its goal. Instead, tradeoffs of slower performance and/or increased size of the executing application are often incurred. Thus while development systems simplify the task of creating meaningful programs, they rarely produce machine code which is not only the most efficient (smallest) in size but also executes the fastest.
One approach for improving the machine-level code generated for a program is to employ an execution "profiler" for analyzing the code, including looking at program performance for detecting any significant performance bottlenecks. Other analyses include detecting invalid API (Application Programming Interface) usage and memory leaks, as well as performing working set and coverage analysis.
Using a profiler, a developer can determine: how many times a particular section of code is executed (i.e., function is called, loop is iterated, and the like) and how long does it take to execute a particular passage of code. A passage executed a million times during operation of a program deserves more attention than one executed only once or twice. Improvements in the former typically have a profound effect on overall program performance, while improvements in the latter probably would yield only marginal improvements.
Profilers typically employ one of two approaches for analyzing a program. In the first approach, the profiler periodically interrupts the program's operation and checks the current location of the program counter. The results are scored using statistical methodology. Although the approach is not difficult to implement, the results are not particularly good. For instance, sections of code which may be of interest might be too small to be sampled accurately. Also, the approach cannot tell reliably how many times a passage was employed. The second approach is to start a system timer when the program reaches a passage of interest and stop the timer when the program leaves the passage. The approach is harder to implement but generally leads to more accurate analysis of the program.
A particular disadvantage common to most code patching approaches is that a special version or "build" of the application being profile must be created--one with added code to monitor the calls of functions/subroutines in the software. Here, the typical approach employed is to use special compile options that produce extra function entry/exit function calls on entry and exit of a function. In addition to the additional calls, the approach requires a special executable link operation to bind in the function entry/exit function calls and the required runtime support for them. Using special builds of a software application that produce extra function entry/exit function calls is undesirable. Not only must extra steps must be performed by the developer, the resulting executable itself cannot be used for delivery to an end user, as it is too big and too slow for practical use. Further, the build may require extra analysis/monitoring code.
An alternative technique to creating a special build of an application is to apply "code patching" instead. On Intel platforms a relative 32-bit jump instruction requires five bytes in the machine code. With code patching, one copies the first five bytes of a function to a dynamically-allocated stub (additional bytes may have to be copied if an instruction crosses the first five bytes). The first five bytes are then replaced with a five-byte relative jump instruction to the begining of the stub. Another jump instruction is then placed at the end of the stub to jump back to this patched function preamble. This jump location is at the end of what is copied to the stub.
The approach is problematic, however. Many functions can have jump/loop instructions that jump into the first five bytes of a given function. One cannot guarantee that there exists at least five bytes of instruction at the begining of a function. Accordingly, a better technique is desired.
Code patching with a breakpoint instruction is another approach. In Intel-based systems, the breakpoint instruction is a one-byte instruction. As the approach is ideal for patching function entry points, it is often employed by software debuggers. Despite this advantage, a problem arises with program performance. A breakpoint execution causes a hardware exception that must be handled with a context switch. The overhead incurred can be substantial, particular when a tool is monitoring all the functions in an application.
Given the time pressures of modern-day software development, there is little room in one's development schedule for use of time-intensive profiling tools, particular ones which create applications which are unusable to end users. Yet given the potential performance benefits of optimized program code, there remains great interest in developing optimization techniques which do not incur a substantial time penalty in the development cycle and do not require creation of special program builds.