Database administrators (DBAs) tune an information retrieval system (such as a database management system or simply “DBMS”) based on their knowledge of the DBMS and its workload. The workload type, specifically whether it is Online Transactional Processing (OLTP) or Decision Support System (DSS), is a key criterion for tuning (reference is made to DB2 Universal Database Version 7 Administration Guide: Performance, IBM Corporation: 2000, and reference is also made to Oracle9iDatabase Performance Guide and Reference, Release 1(9.0.1), Part# A87503-02, Oracle Corp.: 2001). In addition, a DBMS experiences changes in the type of workload it handles during its normal processing cycle. For example, a bank may experience an OLTP-like workload by executing the traditional daily transactions for almost the whole month, while in the last few days of the month, the workload becomes more DSS-like due to the tendency of issuing financial reports and running long executive queries to produce summaries. DBAs should therefore also recognize the significant shifts in the workload and reconfigure the system to maintain acceptable levels of performance.
There is an earnest interest in building autonomic computing systems. These systems know themselves and their surrounding environment and then regulate themselves; this removes complexity from lives of administrators and users alike (reference is made to A. Ganek and T. Corbi, “The Dawning of Autonomic Computing Era,” IBM Systems Journal, 42, 1 Mar. 2003). One of the prerequisites to achieve system autonomicity is identifying the characteristics of the workload put on the system and recognize its properties by using a process called “Workload Characterization”.
There are numerous studies characterizing database workloads based on different properties that can be exploited for tuning the DBMS (reference is made to S. Elnaffar and P. Martin, “Characterizing Computer Systems' Workloads,” Technical Report 2002-461, Queen's University, December 2002).
Some studies show how to use clustering to obtain classes of transactions grouped according to their consumption of system resources or according to reference patterns to tune the DBMS (reference is made to P. Yu, and A. Dan, “Performance Analysis of Affinity Clustering on Transaction Processing Coupling Architecture,” IEEE Transactions on Knowledge and Data Engineering 6, 5, 764-786 (October 1994).
Other studies show how to use clustering to obtain classes of transactions grouped according to their consumption of system resources or according to reference patterns to balance a workload (reference is made to C. Nikolaou, A. Labrinidis, V. Bohn, D. Ferguson, M. Artavanis, C. Kloukinas, and M. Marazakis, “The Impact of Workload Clustering on Transaction Routing,” Technical Report FORTH-ICS TR-238: December 1998).
Yet other studies focus on how to characterize database access patterns to predict a buffer hit ratio (reference is made to Dan, P. Yu, and J. Chung, “Characterization of Database Access Pattern for Analytic Prediction of Buffer Hit Probability,” Very Large Data Bases (VLDB) Journal 4, No. 1, 127-154: 1995).
Yet again, other studies focus on how to characterize database access patterns to predict user access behavior (reference is made to C. Sapia, “PROMISE: Predicting Query Behavior to Enable Predictive Caching Strategies for OLAP Systems,”. Proc. of the Second International Conference on Data Warehousing and Knowledge Discovery (DAWAK 2000), 224-233: 2000).
Recent studies characterize DBMS workloads on different computer architectures to diagnose performance degradation problems (reference is made to A. Ailamaki, D. DeWitt, M. Hill, and D. Wood, “DBMSs On A Modern Processor: Where Does Time Go?,” Proc. of Int. Conf. On Very Large Data Bases (VLDB '99), 266-277: September 1999).
Other recent studies characterize DBMS workloads on different computer architectures to characterize the memory system behavior of the OLTP and DSS workloads (reference is made to L. Barroso, K. Gharachorloo, and E. Bugnion, “Memory System Characterization of Commercial Workloads,” Proc. Of the 25th International Symposium on Computer Architecture, 3-14: June 1998).
Another reference systematically analyzes the workload characteristics of workloads specified in TPC-C™ (TPC Benchmark C Standard Specification Revision 5.0, Transaction Processing Performance Council: February 2001) and TPC-D™ (TPC Benchmark D Standard Specification Revision 2.1, Transaction Processing Performance Council: 1999), especially in relation to those of real production database workloads. It has been shown that the production workloads exhibit a wide range of behavior. In general, the two benchmarks complement each other in reflecting the characteristics of the production workloads (reference is made to W. Hsu, A. Smith, and H. Young, “Characteristics of Production Database Workloads and the TPC Benchmarks,” IBM Systems Journal 40, No. 3: 2001).
To progress towards Autonomic Database Management Systems (ADBMSs), workload characterization is imperative in a world that increasingly deploys “universal” database servers that are capable of operating on a variety of structured, semi-structured and unstructured data and across varied workloads ranging from OLTP through DSS. Universal database servers, such as IBM® DB2, Universal Database‰ (reference is made to DB2 Universal Database Version 7 Administration Guide: Performance, IBM Corporation: 2000), allow organizations to develop database skills on a single technology base that covers the broad needs of their business. Universal databases are increasingly used for varying workloads whose characteristics change over time in a cyclical way. Most of the leading database servers today fall into this category of universal database, being intended for use across a broad set of data and purposes.
What is therefore needed is a method for identifying types of DBMS workloads. The need for such a method has heretofore remained unsatisfied.