In the latter half of the twentieth century, there began a phenomenon known as the information revolution. While the information revolution is a historical development broader in scope than any one event or machine, no single device has come to represent the information revolution more than the digital electronic computer. The development of computer systems has surely been a revolution. Each year, computer systems grow faster, store more data, and provide more applications to their users.
A modern computer system typically comprises one or more central processing units (CPU) and supporting hardware necessary to store, retrieve and transfer information, such as communication buses and memory. It also includes hardware necessary to communicate with the outside world, such as input/output controllers or storage controllers, and devices attached thereto such as keyboards, monitors, tape drives, disk drives, communication lines coupled to a network, etc. The CPU or CPUs are the heart of the system. They execute the instructions which comprise a computer program and directs the operation of the other system components.
From the standpoint of the computer's hardware, most systems operate in fundamentally the same manner. Processors are capable of performing a limited set of very simple operations, such as arithmetic, logical comparisons, and movement of data from one location to another. But each operation is performed very quickly. Sophisticated software at multiple levels directs a computer to perform massive numbers of these simple operations, enabling the computer to perform complex tasks. What is perceived by the user as a new or improved capability of a computer system is made possible by performing essentially the same set of very simple operations, but using software having enhanced function, along with faster hardware.
In the very early history of the digital computer, computer programs which instructed the computer to perform some task were written in a form directly executable by the computer's processor. Such programs were very difficult for a human to write, understand and maintain, even when performing relatively simple tasks. As the number and complexity of such programs grew, this method became clearly unworkable. As a result, alternate forms of creating and executing computer software were developed.
The evolution of computer software has led to the creation of sophisticated software development environments. These environments typically contain a range of tools for supporting the development of software in one or more high-level languages. For example, interactive source editors support the initial generation of source code by a developer. Source databases may support collections of source modules or source objects, which serve as the component parts of software applications. Front-end compiler/debuggers perform simple semantic verification of the source and reduction to a standard form. Back-end or optimizing compilers generate machine executable object code from the standard form, and may optimize the performance of this code using any of various optimization techniques. Build utilities assemble multiple object code modules into fully functioning computer programs.
Among the tools available in many such programming development environments are a range of diagnostic and debug tools. Although compilers and debuggers used during the initial creation and compilation phases of development can identify certain obvious inconsistencies in source code and produce object code conforming to the source, they can not verify the logic of a program itself, or that the program makes use of available resources in an efficient manner. This is generally accomplished by observing the behavior of the program at “run-time”, i.e., when executed under real or simulated input conditions. Various trace tools exist which collect data concerning the run-time behavior of a computer program. Analytical tools assist the programmer in analyzing the trace data to find logical errors, inefficiencies, or other problems with the code.
Data collected by trace tools might include various program parameters, such as code paths taken, procedures and functions called, values of key variables, storage accesses, and so forth. Among the data often collected is a record of instances of memory allocations. I.e., programs cause portions of a virtual memory space (which may be a virtual memory space associated with the program, or associated with some other entity, or a universal virtual memory space) to be allocated and deallocated during execution from a dynamic memory allocation area known as the “heap”. Allocation may be for user-specified data structures such as “objects” in an object-oriented programming environment, or other records or arrays, or for stacks or other program constructs. Object allocation is particularly of interest, since many object-oriented languages tend to allocate a large number of short-lived objects from the heap, and the use and reuse of memory space on the heap significantly affects program performance.
Conventionally, a trace tool recording instances of memory allocations generates a sequential list of all the memory allocations and deallocations during program execution, each list entry including the address range (such as a starting address and length of the allocation) and a sequence indicator showing where in the execution of the program the allocation occurred. The sequence indicator is commonly a timestamp, but could conceivably be some other indicator tied to program instructions or other indicia of sequential execution. The collected memory allocation trace data may be analyzed after program execution to determine the state of memory allocations at any given time (sequence) in the program execution.
A trace tool may also collect trace data for each reference to memory during execution, in the form of a list of memory reference addresses and corresponding sequence indicators. When a program makes a reference to memory (such as a load or store), it is useful for diagnostic purposes to know to which instance of allocated memory the memory reference pertains. Because memory allocations may be of variable sizes and variable lifetimes, determining the applicable memory allocation for a given memory reference usually involves scanning the memory allocations until the applicable allocation is found, an operation which is performed separately for each memory reference. Thus, where there are N memory allocations and M references to be analyzed, the scope of the task of matching memory allocations to memory references in on the order of N*M.
As programs grow in size and complexity, and particularly as the use of object-oriented programming techniques grows, the task of analyzing memory allocations from trace data mushrooms. A need exists for a more efficient algorithm for analyzing memory allocations from trace data in a computer system.