1. Field of the Invention
This invention relates to high performance computing systems, and more particularly, to performing bounded hash table sorting during dynamic program profiling of software applications.
2. Description of the Relevant Art
Software programmers write applications to perform work according to an algorithm or a method. The program's performance may be increased based on an understanding of the dynamic behavior of the entire program. Inefficient portions of the program may be improved once the inefficiencies are known. The following program information may aid in describing a program's dynamic behavior such as code coverage, call-graph generation, memory-leak detection, instruction profiling, thread profiling, race detection, or other. In addition, understanding a program's dynamic behavior may be useful in computer architecture research such as trace generation, branch prediction techniques, cache memory subsystem modeling, fault tolerance studies, emulating speculation, emulating new instructions, or other. Generally speaking, what is needed is a description of a program's entire control flow including loop iterations and inter-procedural paths.
Accurate instruction traces are needed to determine a program's dynamic behavior by capturing a program's dynamic control flow, not just its aggregate behavior. Programmers, compiler writers, and computer architects can use these traces to improve performance. An approach to obtain instruction traces is to build a simulator, execute applications on it, and collect and compress the resulting information. This approach requires a large amount of memory and a large amount of time to complete the process. Further, a simulator may not accurately capture the dynamic behavior of the application executing on a particular hardware system (e.g., since the simulator may be operating on statistical data).
In order to reduce both memory storage and execution time required to collect data, another approach is to perform profiling on only a small subset of the application. Yet other approaches investigate only memory reference traces. Also, hot path profiling measures the frequency and cost of a program's executed paths. It is an essential technique to understand a program's control flow. However, many current path profiling techniques only capture acyclic paths. Acyclic paths end at loop iteration and procedure boundaries, and, therefore, these paths do not describe the program's flow through procedure boundaries and loop iterations. These approaches do not capture whole program profiling of the application.
Further, a popular manner to hold data regarding the behavior of a program is to store it in a hash table. Depending upon the size of the hash table and the chosen hash function, the length of any given slot in a hash table may be appreciably deep. When a particular program region is determined to be hot, and the data corresponding to that region is indexed in the hash table upon every occurrence of the hot region, a large number of accesses to the hash table may be required. This large number of accesses may result in enormous overhead due to the pointer chasing problem, or memory serialization effects associated with indirect memory addressing. In the context of a program profiler, the number of collected paths may be large. Coupled with size restraints on the width of a hash table, the hash table will necessarily become deep at some point. Once this happens, the pointer chasing overhead becomes a major factor in the performance of the instrumented application.
In view of the above, efficient methods and mechanisms for maintaining efficient bounded hash table sorting during dynamic whole program profiling of software applications is desired.