A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
The present invention relates generally to information processing environments and, more particularly, to optimizing the process of retrieving information stored in a data processing system, such as a Database Management System (DBMS).
Computers are very powerful tools for storing and providing access to vast amounts of information. Computer databases are a common mechanism for storing information on computer systems while providing easy access to users. A typical database is an organized collection of related information stored as xe2x80x9crecordsxe2x80x9d having xe2x80x9cfieldsxe2x80x9d of information. As an example, a database of employees may have a record for each employee where each record contains fields designating specifics about the employee, such as name, home address, salary, and the like.
Between the actual physical database itself (i.e., the data actually stored on a storage device) and the users of the system, a database management system or DBMS is typically provided as a software cushion or layer. In essence, the DBMS shields the database user from knowing or even caring about underlying hardware-level details. Typically, all requests from users for access to the data are processed by the DBMS. For example, information may be added or removed from data files, information retrieved from or updated in such files, and so forth, all without user knowledge of underlying system implementation. In this manner, the DBMS provides users with a conceptual view of the database that is removed from the hardware level. The general construction and operation of a database management system is known in the art. See, e.g., Date, C., An Introduction to Database Systems, Volumes I and II, Addison Wesley, 1990; the disclosure of which is hereby incorporated by reference.
DBMS systems have long since moved from a centralized mainframe environment to a de-centralized or distributed environment. One or more PC xe2x80x9cclientxe2x80x9d systems, for instance, may be connected via a network to one or more server-based database systems (SQL database server). Commercial examples of these xe2x80x9cclient/serverxe2x80x9d systems include Powersoft(trademark) clients connected to one or more Sybase Adaptive Server(trademark) (formerly, Sybase SQL Server(trademark)) database servers. Both Powersoft(trademark) and Sybase Adaptive Server(trademark) are available from Sybase, Inc. of Emeryville, Calif.
As the migration to client/server continues, each day more and more businesses are run from mission-critical systems which store information on server-based SQL database systems, such as Sybase SQL Server(trademark). As a result, increasingly higher demands are being placed on server-based SQL database systems to provide enterprise-wide decision supportxe2x80x94providing timely on-line access to critical business information (e.g., through xe2x80x9cqueriesxe2x80x9d). Accordingly, there is much interest in improving the performance of such systems, particularly database queries, for enhancing decision support.
The present invention comprises a Client/Server Database System with improved methods for performing database queries. In an exemplary embodiment, the system includes one or more Clients (e.g., Terminals or PCs) connected via a Network to a Server. The Server, operating under a server operating system (e.g., UNIX(copyright), available from Novell, Inc. of Provo, Utah) includes a Database Server System, such as Sybase SQL Server(trademark). In general operation, Clients store data in and retrieve data from one or more database tables resident on the Server by submitting SQL commands, some of which specify xe2x80x9cqueriesxe2x80x9dxe2x80x94criteria for selecting particular records of a one or more tables. For enhancing the speed in which the Database Server performs queries, a new invariant technique is introduced to evaluate correlated queries (i.e., queries containing one or more correlated subqueries) efficiently.
Correlated queries are very common and important in decision support systems. Traditional nested iteration evaluation methods for such queries can be very time consuming. When they apply, query rewriting techniques have been shown to be much more efficient, but query rewriting is not always possible. When query rewriting does not apply, something better can be done than the traditional nested iteration methods.
The basic approach of the present invention, therefore, is to recognize the part(s) of the subquery that is not related to the outer references and cache the result(s) of that part(s) after its first execution. Later, the result can be reused and combined with the result of the rest of the subquery that is changing for each iteration. This technique is applicable to arbitrary correlated subqueries.
Methods are introduced to recognize the invariant part of a data flow tree, and to restructure the evaluation plan to reuse the stored intermediate result. An efficient method is proposed to teach an existing join optimizer to understand the invariant feature and thus allow it to be able to generate better join plans in the new context. Some other related optimization techniques are also introduced. The proposed techniques were implemented using an existing real commercial database system.
An experimental evaluation of the proposed technique indicates that, when query rewriting is not possible, the invariant technique is significantly better than the traditional nested iteration method. Even when query rewriting applies, the invariant technique is sometimes better than the query rewriting technique. Thus, the invariant technique is advantageous as an alternative in evaluating correlated queries since it fills the gap left by rewriting techniques.