A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
The present invention relates generally to information processing environments and, more particularly, to computer-implemented methodology related to reusing cost-based optimization decisions in a data processing system, such as a Database Management System (DBMS).
Computers are very powerful tools for storing and providing access to vast amounts of information. Computer databases are a common mechanism for storing information on computer systems while providing easy access to users. A typical database is an organized collection of related information stored as xe2x80x9crecordsxe2x80x9d having xe2x80x9cfieldsxe2x80x9d of information. As an example, a database of employees may have a record for each employee where each record contains fields designating specifics about the employee, such as name, home address, salary, and the like.
Between the actual physical database itself (i.e., the data actually stored on a storage device) and the users of the system, a database management system or DBMS is typically provided as a software cushion or layer. In essence, the DBMS shields the database user from knowing or even caring about underlying hardware-level details. Typically, all requests from users for access to the data are processed by the DBMS. For example, information may be added or removed from data files, information retrieved from or updated in such files, and so forth, all without user knowledge of underlying system implementation. In this manner, the DBMS provides users with a conceptual view of the database that is removed from the hardware level. The general construction and operation of a database management system is known in the art. See e.g., Date, C., An Introduction to Database Systems, Volume I and II, Addison Wesley, 1990; the disclosure of which is hereby incorporated by reference.
DBMS systems have long since moved from a centralized mainframe environment to a de-centralized or distributed environment. One or more PC xe2x80x9cclientxe2x80x9d systems, for instance, may be connected via a network to one or more server-based database systems (SQL database server). Commercial examples of these xe2x80x9cclient/serverxe2x80x9d systems include Powersoft(trademark) clients connected to one or more Sybase SQL Server(trademark) database servers. Both Powersoft(trademark) and Sybase SQL Server(trademark) are available from Sybase, Inc. of Emeryville, Calif.
As the migration to client/server continues, each day more and more businesses are run from mission-critical systems which store information on server-based SQL database systems, such as Sybase SQL Server(trademark). As a result, increasingly higher demands are being placed on server-based SQL database systems to provide enterprise-wide decision supportxe2x80x94providing timely on-line access to critical business information (e.g., through xe2x80x9cqueriesxe2x80x9d). Accordingly, there is much interest in improving the performance of such systems, particularly database queries, for enhancing decision support.
The present invention comprises a Client/Server Database System with improved methodology for optimizing the performance of database queries. In an exemplary embodiment, the system includes one or more Clients (e.g., Terminals or PCs) connected via a Network to a Server. The Server, operating under a server operating system (e.g., UNIX) includes a Database Server System, such as Sybase Adaptive Server Enterprise (ASE). In general operation, Clients store data in and retrieve data from one or more database tables resident on the Server. Each table itself comprises one or more horizontal rows or records divided into columns or fields of information. For enhancing the speed in which the Database Server stores, retrieves, and processes particular data records, the Server maintains one or more database indexes on each table. A database index, which in an exemplary embodiment is maintained as a B-Tree data structure, allows the records of a table to be organized in many different ways, depending on a particular user""s needs, with the speed of retrieving any particular data record significantly improved.
Methodology of the present invention, implemented as an xe2x80x9cAbstract Plan on Discxe2x80x9d technology (referred to herein as, xe2x80x9cAriadnexe2x80x9d), is provided for turning cost based optimization decisions into stored, reusable items. In particular, the present invention provides a novel language interface to the optimizer, through an Abstract Plan, through which it can be given a description of the desired query execution plan (QEP). The language interface defines a declarative language syntax that allows description of the QEP. It does not specify the sequence of operations the database system""s optimizer and code generator should accomplish to generate the QEP, but rather describes the desired outcome. Such an approach provides an abstraction barrier between an optimizer directives language and some specific optimizer and code generator. In this manner, the present invention allows a database system the ability to generate a better execution plan, and thereby realize better query performance. Such a feature has particular utility in avoiding performance regressions on server release upgrade and in query optimization fine tuning.
APxe2x80x94Abstract Plan, a relational algebra describing the execution plan of a SQL query.
ASExe2x80x94Sybase Adaptive Server Enterprise(trademark), the commercial embodiment, available from Sybase, Inc. of Emeryville Calif.
associationxe2x80x94The means by which a specific query is associated with its corresponding AP, if any. For ad hoc queries, for instance, the DML statement text is used as an association.
base tablexe2x80x94A persistent material table.
cost based decisionxe2x80x94An optimization decision based on a RAFSS search guided by the quantitative estimation of the costs and the selection of the cheapest alternative.
declarative queryxe2x80x94A relational calculus expression that specifies the result set by giving its properties, rather than by providing the way to obtain it. The SQL language is isomorphic with a relational calculus.
derived tablexe2x80x94A table implemented as a stream of rows, representing the result of a relational operator. Only the current row is accessible at some moment in time. Allows only sequential access, in the order of the flow of rows.
emergent behaviorxe2x80x94A complex behavior that is never explicitly given, but comes to life as implied by a set of simple rules that interact.
execution plan of a queryxe2x80x94see QEP
optimization decisionxe2x80x94The act of choosing among the possible next moves for a given point in the RAFSS.
planxe2x80x94The encapsulation of the association, AP and all status information related to a specific query. To avoid confusion, in this document the term plan will not be used for query execution plan.
procedural queryxe2x80x94An relational algebra expression that specifies the result set by giving the composition of relational operators that achieve it. The set of query plans supported by an relational execution engine is isomorphic with a relational algebra
QEPxe2x80x94Query Execution Plan, an server engine specific data structure created by the code generation module based on the outcome of the optimization and that directs the processing of a query in the execution module.
queryxe2x80x94A relational calculus or algebra expression that denotes a derived table as result set.
query optimizationxe2x80x94The process of selecting the best (according to some cost criteria) relational algebra formula (i.e., procedural representation, isomorphic to a query plan) that has the same semantics as the relational calculus formula (i.e., declarative representation, isomorphic to SQL) that specifies a given query. This process can be thought of as an RAFSS search.
query planxe2x80x94see QEP
RAFSSxe2x80x94Relational Algebra Formulae State Space, a search state space over the set of relational algebra formulae relevant to a query. The optimization process can be thought of as a search in this space, either starting with a formula already semantically equivalent to the query and gradually transforming it to another equivalent one that complies with some criteria, or starting with an empty formula and gradually assembling of smaller pieces the desired outcome.
rule based decisionxe2x80x94An optimization decision based on instantiating a set of rules (qualitative heuristics) against the current point in the RAFSS, and selecting the one to apply. The rule""s application materializes the choice.
statusxe2x80x94Not used in the initial embodiment. In subsequent versions, the plan will ideally include several status fields that would store per plan usage enabling/disabling and other flags, the estimated/actual execution time, page counts, or the like.
stored tablexe2x80x94A table that is materialized, i.e., entirely stored at some moment of time. Allows both sequential access in some predefined order and, provided indices are available, direct access and maybe alternate order sequential access. Material tables are either base tables or work tables.
tablexe2x80x94The physical implementation of a relation. A table is either material or derived.
work tablexe2x80x94A table that gets materialized only temporarily, during the execution of a query, to hold intermediate results.