Many programming languages, such as Java, C#, Python and Ruby, include a collection framework as part of the language runtime. Generally, collection frameworks provide the programmer with abstract data types for handling groups of data (e.g, lists, sets and maps), and hide the details of the underlying data structure implementation. Modern programs written in these languages rely heavily on collections, and choosing the appropriate collection implementation (and parameters) for every usage point in a program may be critical to program performance.
Real world applications may be allocating collections in thousands of program locations, making any attempt to manually select and tune collection implementations into a time consuming and often infeasible task. Recent studies have shown that in some production systems, the utilization of collections might be as low as 10%. In other words, 90% of the space consumed by collections in the program is overhead.
Existing profilers ignore collection semantics and memory layout, and aggregate information based on types. Offline approaches using heap-snapshots (such as those described in N. Mitchell and G. Sevitsky, “Leakbot: An Automated and Lightweight Tool for Diagnosing Memory Leaks in Large Java Applications,” ECOOP 2003—Object-Oriented Programming, 17th European Conference, vol. 2743 of Lecture Notes in Computer Science, 351-377 (2003); or N. Mitchell and G. Sevitsky, “The Causes of Bloat, the Limits of Health,” OOPSLA '07: Proc. of the 22nd annual ACM SIGPLAN Conf. on Object Oriented Programming Systems and Applications, ACM, 245-260 (2007)) lack information about access patterns, and cannot correlate heap information back to the relevant program site.
Further, existing profiling tools require the user to manually filter large amounts of irrelevant data, typically offline, in order to make an educated guess. Using several heap-snapshots taken during program execution may reveal the types that are responsible for most of the space consumption. However, a heap snapshot does not correlate the heap objects to the point in the program in which they are allocated. Therefore, finding the program points that need to be modified requires significant effort, even for programmers familiar with the code. Moreover, once the point of collection allocation is found, it is not clear how to choose an alternative collection implementation.
In particular, choosing an alternative collection implementation with lower space overhead is not always desirable. Some structures, such as hash-tables, have inherent space overhead to facilitate more time-efficient operations. In order to pick an appropriate implementation, some information about the usage pattern of the collection in the particular application is required.
A need therefore exists for improved profiling tools that automatically select the appropriate collection implementations for a given application. A further need exists for improved profiling tools that use semantic profiling together with a set of collection selection rules to make an informed choice. Yet another need exists for a profiling tool that integrates heap-information with information about the usage-pattern of collections.