The construction of many large applications that manipulate complex data structures in domains like computer aided design (CAD), multi-media, network management, and telecommunications, has motivated significant research and development of OODBs. However, the development of tools and methods to analyze, compare, and tune the performance of these database systems has lagged behind. Furthermore, different users of OODBs have different needs for analyzing and comparing the performance of OODBs. For example, end users of OODBs are interested in comparing the overall performance of different OODBs for a wide range of workloads. Database application developers want to predict the performance of their particular application with different OODBs. Moreover, both end users and application developers are interested in benchmarking OODBs; the former with a very general workload and the latter with a specific workload.
A third category of users is interested in analyzing and tuning the performance of individual OODB components. For example, application architects want to identify the bottlenecks in an OODB by analyzing the overhead of different OODB system components for their application workload; OODB developers perform the same analysis, but for a general workload.
Current OODB benchmark workloads provide a general workload for benchmarking OODB systems such as a benchmark described by M. J. Carey, D. J. Dewitt, and J. F. Naughton in their paper "The OO7 Benchmark," Proceedings of the ACM SIGMOD conference, pp. 12-21, Washington D.C., 1993. However, these benchmark workloads cannot be used for the kinds of studies described above, because of the following three limitations. First, these are synthetic benchmarks, i.e., they are not based on any real application but on the designer's notion of a typical application. Due to limited correlation between the data structures and operations of a synthetic benchmark and real applications, it is difficult to relate the performance of an application to this type of benchmark. In addition, this problem also precludes use of these benchmarks in analyzing and tuning application performance. Most of these benchmarks model CAD workloads. Therefore, OODB users in other domains, such as multi-media, find it difficult to relate benchmark results to their applications. Finally, these benchmarks were not designed to allow systematic analyses of OODB components. For example, it has been shown that it is difficult to study the impact of different clustering strategies on benchmark performance for a particular OODB, without understanding and modifying the benchmark source code. Some of the problems associated with different clustering strategies are described by A. J. Tiwary, V. Narasayya, and H. M. Levy in a paper entitled "Evaluation of OO7 as a System and an Application Benchmark," Proceedings of the OOPSLA95 Workshop on Object Database Behavior, Benchmarks and Performance, Austin, Tex., October 1995.
The limitations of synthetic benchmarks may be overcome by using a collection of real applications. One collection is described by J. P. Singh, W. Weber, and A. Gupta in the paper "SPLASH: Stanford Parallel Applications for Shared-Memory," Computer Architecture News, vol. 20, no. 1, pp. 5-44. However, these collections of real applications are not feasible for OODBs, because most real OODB applications are proprietary. Furthermore, these applications are large and complex and porting them across multiple OODBs is difficult. The size and complexity of these applications also make it difficult to understand and modify their source code, which in turn makes their systematic analysis and tuning difficult and expensive.
Accesses to persistent objects can be recorded in three principal ways: (1) by instrumenting the OODB (the library and/or the server, depending on OODB structure); (2) by instrumenting the application executable; or (3) by instrumenting the application source code. Instrumenting the OODB is only useful to study a specific system, and only by a small set of people that have access to the system source code. Alternatively, as described by C. Lamb, G. Landis, J. Orenstein, and D. Weinreb in the paper titled "The ObjectStore Database System," Communications of the ACM, 34 (10): 50-63, October 1991, the ObjectStore OODB ships with such instrumentation for performance counters in the client library and server; however, these counters are difficult to use and to correlate with application characteristics. The ObjectStore Performance Expert as described in the "ObjectStore Performance Expert Users Manual", provides an easy-to-use interface to the performance counters, but it does not relate database performance information to application characteristics. Such tools and instrumentation are limited to use on one database system and cannot be used to compare performance across multiple OODBs.
Object accesses may also be instrumented by modifying the application executable files using tools such as described by A. Srivastava, and A. Eustace in the paper "ATOM: A System for Building Customizable Program Analysis Tools," Technical Report WRL-TR-94.2, DEC Western Research Laboratory, Palo Alto, Calif. 1994. Although this approach does not require application source code, it is more difficult and less efficient to detect persistent objects, distinguish between application and library functions, and detect inlined methods. Finally, OODBs that swizzle at an object granularity use a level of indirection in accessing persistent objects to perform residency checks and object loading. This indirection provides a convenient instrumentation point and does not require access to the OODB source code. However, instrumenting the indirection point via language mechanisms (which overloads the dereference operator), does not capture key information such as the name of the method invoked. Thus, it appears preferable to instrument the application source code, because this approach is most suitable for generating traces efficiently across a variety of different OODBs, operating systems, and hardware platforms.
Trace-based analysis and simulation have been used in the study of file systems, virtual memory systems, and memory management. M. G. Baker, J. H. Hartman, M. D. Kupfer, K. W. Shirriff, and J. K. Ousterhout describe employing trace based analysis of file systems in "Measurements of a Distributed file system," Operating System Review, vol. 25, no. 5, pp. 198-212, 1991. D. J. Hartfield and J. Gerald discuss using trace based analysis for studying virtual memory systems in "Program Restructuring for Virtual Memory," IBM Systems Journal, vol. 10, no. 3, pp. 168-192, 1971. Also, P. R. Wilson, M. S. Lam, and T. G. Moher described employing trace based analysis for memory management in the paper "Caching considerations for generation garbage collection," Proceedings of the 1992 ACM Conference on LISP and Functional Programming, pp. 32-42, San Francisco, Calif., June 1992.
Trace-based analysis and simulation have also been used to study specific OODB properties like buffering and clustering, and pointer swizzling. E. E. Chang and R. H. Katz discuss the study of specific OODB buffering and clustering properties in their paper "Exploiting Inheritance and Structure Semantics for Effective Clustering and Buffering in an Object-Oriented DBMS," ACM SIGMOD International Conference on Management of Data, Portland, Oreg., pp. 348-357, 1989. J. Cook, A. Wolf, and B. G. Zorn describe trace driven simulations in their paper "The Design of A Simulation System for Persistent Object Storage Management," Technical Report CU-CS-64, Department of Computer Science, University of Colorado, Boulder, Colo. Cook et al. have designed a trace driven simulation system for the evaluation of different algorithms to be used in OODBs, but it has not been reduced to practice. Further, it is unclear how they intended to gather these traces and its impact on real applications. Also, M. McAuliff and M. Solomon discuss pointer swizzling in the paper "A Trace-Based Simulation of Pointer Swizzling Techniques," Proceedings of the International Conference on Data Engineering, Taipei, Taiwan, March 1995. McAuliff and Solomon discuss a trace based simulation technique for pointer swizzling, which requires manual instrumentation of a specific server to capture accesses to persistent objects. A specialized simulator was used to study different swizzling techniques. None of these tracing and swizzling techniques were generalized to study OODB performance in general. All of these previous approaches provide for manually instrumenting the database server to create traces. However, the prior art does not teach a comprehensive framework for analyzing, benchmarking, and tuning OODB systems and applications, based on automafically generated traces. Clearly, there is a need for an automated system to facilitate these functions, which does not exist in the prior art.