As defined by Microsoft® Computer Dictionary, Fourth Edition, Microsoft Press (1999), the heap is a portion of memory in a computer that is reserved for a program to use for the temporary storage of data structures whose existence or size cannot be determined until the program is running. To build and use such elements, programming languages such as C and Pascal include functions and procedures for requesting free memory from the heap, accessing it, and freeing it when it is no longer needed. In contrast to stack memory, heap memory blocks are not freed in reverse of the order in which they were allocated, so free blocks may be interspersed with blocks that are in use. As the program continues running, the blocks may have to be moved around so that small free blocks can be merged together into larger ones to meet the program's needs.
Microsoft® Computer Dictionary, Fourth Edition, Microsoft Press (1999) further defines garbage collection as, “a process for automatic recovery of heap memory. Blocks of memory that had been allocated but are no longer in use are freed, and blocks of memory still in use may be moved to consolidate the free memory into larger blocks. Some programming languages require the programmer to handle garbage collection. Others, such as Java, perform this task for the programmer.”
Many currently available programming language run-time environments provide a garbage collector to actively and automatically manage heap memory. Examples of such run-time environments include run-time environments for the Java programming language, the C# programming language, and Microsoft Corporation's .Net Common Language Runtime environment. The garbage collector periodically traverses the objects in heap memory to identify objects that are no longer in use, so that the memory occupied by such dead objects or “garbage” can then be reclaimed. Although the garbage collectors may vary in design, they generally operate by tracing or traversing through the live objects by following pointers from a root object or objects of a program in the heap. Those objects still reachable by tracing pointers from the root object(s) are considered “live,” whereas any of the program's objects that can no longer be reached are dead or garbage. The garbage collector then reclaims the memory occupied by such dead objects.
Modern software packages allocate and manage a vast amount of information on the heap. Object oriented languages such as Java and C# almost exclusively use the heap to represent and manipulate complex data structures. The growing importance of the heap necessitates detection and elimination of heap-based bugs. These bugs often manifest themselves in different forms, such as dangling pointers, memory leaks, and inconsistent data structures.
Unfortunately, heap-based bugs are hard to detect. The effect of these bugs is often delayed, and may be apparent only after significant damage has been done to the heap. In some cases, the effect of the bug may not be apparent. For instance, a dangling pointer bug does not crash the program unless the pointer in question is dereferenced, and on occasion, may not cause a crash even then. Consequently, software testing is not very effective at identifying heap-based bugs. Because of the non-deterministic nature of heap based bugs, even if the buggy statement is executed on a test run, it is not always guaranteed to crash the program, or produce unexpected results. Moreover, the effect of heap-based bugs is often delayed, as a result of which testing does not reveal the root-cause of the bug.
Static analysis techniques, such as shape analysis (see, e.g., M. Sagiv, T. W. Reps, and R. Wilhelm, “Parametric Shape Analysis Via 3-Valued Logic,” ACM Trans. Prog. Lang. Syst. (TOPLAS), 24(3):217-298, May 2002), overcome these limitations. They examine all valid code paths, and can also provide soundness guarantees about the results of the analysis. Shape analysis has enjoyed success at determining the correctness of, or finding bugs in algorithms that manipulate heap data structures. However, in spite of recent advances (such as described by B. Hackett and R. Rugina, “Region-Based Shape Analysis With Tracked Locations,” Proc. 32nd Symp. on Princ. of Prog. Lang. (POPL), January 2005; and E. Yahav and G. Ramalingam, “Verifying Safety Properties Using Separation And Heterogeneous Abstractions,” Proc. ACM SIGPLAN Conf. On Prog. Lang. Design and Impl., pages 25-34, June 2004), shape analysis algorithms are expensive, and apply only to limited classes of data structures, and properties to be checked on them. Moreover, the results of static analysis, while sound, are often overly conservative, and over approximate the possible set of heap configurations.
On the other hand, dynamic analysis techniques have the advantage of precisely capturing the set of heap configurations that arise. Several dynamic analysis tools have been developed to detect special classes of heap-based bugs. (See, e.g., T. M. Chilimbi and M. Hauswirth, “Low-Overhead Memory Leak Detection Using Adaptive Statistical Profiling,” Proc. 11th Intl. Conf. on Arch. Support for Prog. Lang. and Op. Sys. (ASPLOS), pages 156-164, October 2004; B. Demsky and M. Rinard, “Automatic Detection And Repair Of Errors In Data Structures,” Proc. 18th ACM SIGPLAN Conf. on Object-Oriented Prog., Systems, Lang. and Appls. (OOPSLA), pages 78-95, October 2003; R. Hastings and B. Joyce, “Purify: Fast Detection Of Memory Leaks And Access Errors,” Winter USENIX Conference, pages 125-136, January 1992; and N. Nethercote and J. Seward, “Valgrind: A Program Supervision Framework,” Elec. Notes in Theor. Comp. Sci. (ENTCS), 89(2), 2003.) However, there has been relatively little research at understanding the runtime behavior of the heap, and applying this information for bug finding.