In the latter half of the twentieth century, there began a phenomenon known as the information revolution. While the-information revolution is a historical development broader in scope than any one event or machine, no single device has come to represent the information revolution more than the digital electronic computer. The development of computer systems has surely been a revolution. Each year, computer systems grow faster, store more data, and provide more applications to their users.
A modern computer system typically comprises one or more central processing units (CPU) and supporting hardware necessary to store, retrieve and transfer information, such as communication buses and memory. It also includes hardware necessary to communicate with the outside world, such as input/output controllers or storage controllers, and devices attached thereto such as keyboards, monitors, tape drives, disk drives, communication lines coupled to a network, etc. The CPU or CPUs are the heart of the system. They execute the instructions which comprise a computer program and directs the operation of the other system components.
From the standpoint of the computer's hardware, most systems operate in fundamentally the same manner. Processors are capable of performing a limited set of very simple operations, such as arithmetic, logical comparisons, and movement of data from one location to another. But each operation is performed very quickly. Sophisticated software at multiple levels directs a computer to perform massive numbers of these simple operations, enabling the computer to perform complex tasks. What is perceived by the user as a new or improved capability of a computer system is made possible by performing essentially the same set of very simple operations, but using software having enhanced function, along with faster hardware.
In the very early history of the digital computer, computer programs which instructed the computer to perform some task were written in a form directly executable by the computer's processor. Such programs were very difficult for a human to write, understand and maintain, even when performing relatively simple tasks. As the number and complexity of such programs grew, this method became clearly unworkable. As a result, alternate forms of creating and executing computer software were developed. In particular, a large and varied set of high-level languages was developed for supporting the creation of computer software.
High-level languages vary in their characteristics, but all such languages are intended to make it easier for a human to write a program to perform some task. Typically, high-level languages represent instructions, fixed values, variables, and other constructs in a manner readily understandable to the human programmer rather than the computer. Such programs are not directly executable by the computer's processor. In order to run on the computer, the programs must first be transformed into a form that the processor can execute.
Transforming a high-level language program into executable form requires that the human-readable program form (source code) be converted to a processor-executable form (object code). This transformation process generally results in some loss of efficiency from the standpoint of computer resource utilization. Computers are viewed as cheap resources in comparison to their human programmers. High-level languages are generally intended to make it easier for humans to write programming code, and not necessarily to improve the efficiency of the object code from the computer's standpoint. The way in which data and processes are conveniently represented in high-level languages does not necessarily correspond to the most efficient use of computer resources, but this drawback is often deemed acceptable in order to improve the performance of human programmers.
While certain inefficiencies involved in the use of high-level languages may be unavoidable, it is nevertheless desirable to develop techniques for reducing inefficiencies where practical. This has led to the use of compilers and so-called “optimizing” compilers. A compiler transforms source code to object code by looking at a stream of instructions, and attempting to use the available resources of the executing computer in the most efficient manner. For example, the compiler allocates the use of a limited number of registers in the processor based on an analysis of the instruction stream as a whole, and thus hopefully minimizes the number of load and store operations. An optimizing compiler might make even more sophisticated decisions about how a program should be encoded in object code. For example, it might determine whether to encode a called procedure in the source code as a set of in-line instructions in the object code.
Even with all the compilation and associated high-level language tools available to the programmer, there are still some types of executable programming code, typically low-level operating system kernel functions, which are of such critical importance that they are manually programmed at a much lower level to achieve greater computer resource efficiency. At these lower levels, the programmer may decide how to represent data, allocate registers, assign storage addresses, and do other tasks often performed by the compiler or optimizing compiler.
A typical program contains many places at which flow of execution may diverge or converge, and many potential paths in the flow of program execution exist. For a typical program, many of these paths are rarely if ever used, while a relatively small number of the paths are utilized frequently. Rarely used paths may exist to handle special cases or errors, or may be unintentional side effects of the way in which a program was written. A program will generally perform more efficiently if the bulk of the system's resources are allocated to the most frequently used paths. For example, variables which occur in the most frequently used paths should be given preferences in the allocation of registers over variables which occur in the rarely used paths. Unfortunately, it is difficult for a compiler or optimizing compiler to know in advance which are the frequently used paths, since whether a path is frequently used or otherwise depends on the input data. One of the reasons that programming code written by a programmer at a low level tends to outperform code which is written at a higher level and compiled to object form is that the programmer usually knows better than the compiler which paths will be most frequently used.
Generally, it is possible to produce more efficient object code, and particularly to produce more efficient object code using an optimizing compiler, if it can be known in advance what the pattern of usage of the various code paths will be.
It is possible to collect data from actual or simulated run-time execution of a computer program in order to determine experimentally the frequency of execution of the various paths of a program. Such data is referred to herein as program execution profile data, or simply profile data for short.
Commonly, collection of profile data is accomplished by inserting special instructions into the program to collect data at key points. These instructions are referred to herein as “instrumentation instructions”, or “hooks”. A hook, which could be a single instruction or a set of instructions (including a called procedure) causes some record to be made each time it is encountered during execution of a program. Typically, the hook causes a corresponding counter to be incremented, although a record could take some other form.
A complete and accurate picture of the performance of a computer program requires that the frequency of taking each possible path in the flow of control be known. Because a typical computer program contains a very large number of possible paths, placing instrumentation hooks in every such path to measure flow is a significant burden. However, it is not necessary to directly measure every path. Mathematical techniques exist for determining a subset of the possible paths for instrumentation, from which the frequency of execution of the remaining unmeasured paths can be inferred. These techniques involve the construction of a control glow graph (CFG), which is a directed graph in which each node represents a basic block of code (i.e., a set of sequential instructions having only one entry point and no branches except at the end) and each arc represents a possible path for transfer of control from one block to another (by branching or by fall-through). The frequency of taking a path (arc) in the control flow graph is represented as an arc weight. It is assumed that flow in the graph is conserved, i.e., the sum of the arc weights of all arcs entering any node is equal to the sum of the arc weights of the arcs leaving the node. From a control flow graph, a spanning tree of arcs can be determined, such that the arc weight of any arc can be inferred from the weights of the arcs that are not in the spanning tree, based on the assumption that flow is conserved. Therefore, if instrumentation hooks are inserted only in the paths represented by arcs not in the spanning tree, the frequency of taking other paths can be inferred. Typically, a spanning tree can be constructed such that only 30%-40% of the arcs in the control flow graph need be instrumented, thus realizing a considerable reduction in the number of instrumentation hooks required.
However, even 30%-40% of the possible paths in a program often represents a very large number of paths. To minimize the deleterious effect of instrumentation hooks on program performance, instrumentation code should be as simple as possible. Specifically, a given hook usually increments a single counter in memory only, without performing other operations. A separate counter is associated with each hook. Counter values are examined and used to derive additional data only after data collection from the program ceases.
Where multiple processes execute the same instrumented program code simultaneously, the simplicity of the instrumentation code can lead to errors. The multiple processes need to access and increment the same counters, yet the instrumentation code has no protection against contention. If two processes both attempt to read, increment, and write back to the same counter simultaneously, one of the increments may be lost. This effect is referred to as “counter contention”.
If all possible paths (control flow arcs) in a program are instrumented, the effects of counter contention is typically small. However, as explained above, instrumenting all paths is very burdensome. Where the arc weights of many paths are inferred from a smaller number of measured paths, errors in the measured paths due to counter contention can be propagated a significant distance in the graph. This may cause counter errors to propagate into code paths which are infrequently or never taken. A compiler attempting to optimize code based on such data may skew the optimization in favor of such paths, to the detriment of other areas of the programming code.
A need exists for a method and apparatus for obtaining more accurate profile data, without the burden of overly complex instrumentation code or larger numbers of instrumentation hooks.