The present invention relates generally to information processing environments and, more particularly, to improved system performance during retrieval of information stored in a data processing system, such as a Relational Database Management System (RDBMS).
Computers are very powerful tools for storing and providing access to vast amounts of information. Computer databases are a common mechanism for storing information on computer systems while providing easy access to users. A typical database is an organized collection of related information stored as "records" having "fields" of information. As an example, a database of employees may have a record for each employee where each record contains fields designating specifics about the employee, such as name, home address, salary, and the like.
Between the actual physical database itself (i.e., the data actually stored on a storage device) and the users of the system, a database management system or DBMS is typically provided as a software cushion or layer. In essence, the DBMS shields the database user from knowing or even caring about underlying hardware-level details. Typically, all requests from users for access to the data are processed by the DBMS. For example, information may be added or removed from data files, information retrieved from or updated in such files, and so forth, all without user knowledge of underlying system implementation. In this manner, the DBMS provides users with a conceptual view of the database that is removed from the hardware level. The general construction and operation of a database management system is known in the art. See e.g., Date, C., An Introduction to Database Systems, Volume I and II, Addison Wesley, 1990; the disclosure of which is hereby incorporated by reference.
RDBMS systems have long since moved from a centralized mainframe environment to a de-centralized or distributed environment. One or more PC "client" systems, for instance, may be connected via a network to one or more server-based database systems (SQL database server). Commercial examples of these "client/server" systems include Powersof.TM. clients connected to one or more Sybase.RTM. SQL Server.TM. database servers. Both Powersof.TM. and Sybase.RTM. SQL Server.TM. are available from Sybase, Inc. of Emeryville, Calif. As the migration to client/server systems continues, each day more and more businesses are run from mission-critical systems which store information on server-based SQL database systems, such as Sybase.RTM. SQL Server.TM.. As a result, increasingly higher demands are being placed on server-based SQL database systems to provide enterprise-wide decision support--providing timely on-line access to critical business information (e.g., through "queries").
At its core, every RDBMS system includes certain modules which perform basic tasks, including a parser, an optimizer, an execution engine, and a data manager. A parser reads client statements and transforms them into an internal representation. An optimizer takes the internal representation of the statement and looks at several alternative strategies for obtaining the correct response, an "answer" from the underlying database. The choices made by the optimizer have a profound impact on a system's response time for the client. Improper choice, for instance, can delay response time by seconds, minutes, or hours. The job of the optimizer is, therefore, to make the best choice using estimations based on the "cost" of various strategies. The execution engine employs the execution strategy formulated by the optimizer to obtain the correct response and give the results to the client. During operation, the execution engine submits requests to the data manager to obtain information from tables. This is done in a manner that was determined by the optimizer, for instance, using available indices, performing table scans, or the like.
In today's information-based economy, on-line database systems are critical for running the day-to-day operations of a business, whether for decision support or for on-line transaction processing. Accordingly, there has been great interest in the area of improving the speed by which these systems execute database queries. The underlying performance of a database system is closely tied to its optimizer, which, in turn, is closely tied to the cost estimates which the optimizer adopts. Consider, for instance, a cost estimate of an optimizer which inaccurately predicts that a particular operation requires only a few seconds, when in fact the operation takes minutes or hours. This type mistake is often magnified in the context of a complex query, where the particular operation might occur hundreds or thousands of times. The end result of the mistake is unacceptable system performance. If, on the other hand, the accuracy of the estimates of the cost of a particular strategy provided by the optimizer is improved, the predicted performance of the final execution plan will be more accurate. In this case, the result is better performance of the RDBMS system. The system exhibits better throughput and response time for queries, including DSS (Decision Support System) queries.
The cost estimates provided by optimizers in present-day RDBMS systems are not particularly accurate. This results in poor execution plan strategies being selected. Attempts to address the problem have focused on "workarounds" for poor optimizer plan selection. Here, systems allow a DBA (Database Administrator) to explicitly override the optimizer's selection with a "force plan option" or "force index option." Such an approach entails significant disadvantages, however. Overriding an optimizer is a highly-skilled, labor-intensive task and, as a result, a very costly proposition for users of RDBMS systems.
This manual override approach exists in stark contrast to what automation users expect from modem RDBMS systems. One of the main advantages of RDBMS systems is that this type of work should be done automatically. The normal mode of operation is that the optimizer should automatically adjust execution plans given that the data distributions in the RDBMS system changes over time. If explicit overrides need to be specified, then this advantage of an RDBMS system is negated and the costly analysis may need to be repeated over and over again. Further, the option of manually overriding a system's optimizer is often not available to users. A growing part of RDBMS business supports "VAR" (Value-added Retailer) applications, including, for instance, those provided by Peoplesoft.TM., Siebel.TM., and Baan.TM.. In these cases, the RDBMS users (i.e., end-user customers) may not even have the ability to use "force" options since only the VAR has the ability to change the application. At the same time, the VARs do not want to make RDBMS vendor specific changes to their application for problems in a particular RDBMS vendor's optimizer. All told, there exists great interest in improving an optimizer's plan selection without requiring users to provide explicit override options.