1. Field of the Invention
The present invention relates generally to the field of database systems and, more particularly, to a system and method allowing for the creation and execution of scalar-based queries in database environments.
2. Brief Description of Prior Developments
Database systems store, retrieve, and process information. In order to retrieve information from the database, a user provides a query (written in a query language such as SQL), where the query specifies the information to be retrieved and the manner in which it is to be manipulated or evaluated in order to provide a useful result. To process the query, the database system converts the query into a relational expression that describes algebraically the result specified by the query. The relational expression is then used to produce an execution plan, which describes particular steps to be taken by a computer in order to produce the sought result.
Database environments comprise various configurations having cooperating homogeneous and/or heterogeneous data stores. These data stores generally maintain data in tables (T) that may reside locally on a single data store or may be distributed among several data stores in a distributed database environment. Included in the category of distributed database environments are clustered databases and federated databases. A clustered database configuration contemplates several data stores closely cooperating to produce a singular view of the data stored on the data stores. Comparatively, a database federation is a group of data stores (e.g. computer servers) that are administered independently, but which cooperate to share the processing load of a system. These various database environments support a number of views that may be imposed to better manage the underlying table (T) data. For example, these database environments (e.g. stand-alone, clustered, and federated) may be partitioned such that the table or tables storing data may be more efficiently handled. In the case of distributed database environments, distributed partitioned views may be created.
Generally, a partitioned view joins horizontally partitioned data from a set of member tables across one or more data stores making the data as if it appears from one database table. In a local partitioned view, all participating tables and the view reside on the same instance of the data store (e.g. the same instance of SQL server). In a distributed partitioned view, at least one of the participating tables resides on a different (i.e. remote) data store. For partitioned tables to exist the location of single data row in a partitioned table (T) must be uniquely located. Current implementations satisfy this requirement through the use of ranges (e.g. column and/or row ranges), or, through hashing functions which operate on columns and/or rows. In the ranges-based implementation, the member tables of the partitioned view are designed so that each table stores a horizontal slice of the original table based on a range of key values. The ranges are based on the data values in a partitioning column. The partitioning column serves to contain values that facilitate the creation and maintenance of partitioned views in partitioned database environments.
The member tables of the partitioned view are designed so that each table stores a horizontal slice of the original table based on a range of key values. The ranges are based on the data values in a partitioning column. The range of values in each member table is enforced by a database environment constraint (e.g. CHECK constraint) on the partitioning column, such that ranges do not overlap.
Current practices allow queries to be performed on partitioned views such that desired data is identified, retrieved, and returned to participating users. In the context of executing queries in distributed database environments, such practices lend to inefficient and non-optimal uses of processing resources. Stated differently, tables residing on a number of data stores may require numerous processing steps to satisfy offered requests. For example, if data is to be retrieved from remote participating data stores in a partitioned database environments, a number of unnecessary steps, such as, reading from redundant partitions may be performed.
In view of the foregoing, it is appreciated that there exists a need for an improved query execution strategy that overcomes the drawbacks of existing practices.
Generally the present invention provides a system and methods that allow for a novel optimization of queries in database environments by using scalar-based queries. In an illustrative implementation, queries are executed using startup filters. A startup filter is a building block used to implement dynamic plan executions in database environments. The generation of startup filters can occur in two possible circumstancesxe2x80x94that is, as part of sub-queries or, alternatively, from scalar predicates. When a predicate context has been identified as a suitable point for the startup filter generation, the predicate is analyzed to determine if there are any restrictions being applied to columns of the queried table that may have single-column CHECK constraints defined. Generally, a single column CHECK constraint is a predicate that is applied for every column modification (e.g. INSERT or UPDATE). In operation, An analysis of the scalar predicate to determine if a start-up filter would be appropriate to execute offered queries. In the implementation, the analysis of the scalar predicate expression comprises the acts of determining whether: 1) there are suitable startup expressions for offered scalar predicates; and 2) there are suitable startup expressions the conjunction (i.e. AND of all of the expressions) is returned. As part of the determination of suitable startup expressions a number of functions and operations are performed. These functions serve to pair down suitable startup expressions that will satisfy the inputted scalar predicate of the original query. Included in the execution of these functions is parameterized execution that serves to facilitate the processing of data that may reside on remote data stores.
Alternatively, offered queries contexts (as opposed to a scalar predicate contexts) can serve as a suitable point for startup filter generation. In operation, offered queries are first analyzed to determine where there may be any restrictions that have been applied. Using the restrictions, suitable start-up filters are identified for use in query execution. The determination of appropriate startup expressions for offered queries may be realized through steps comprising the execution of data functions. Such functions serve to find the startup expression most suitable for offered queries.
Further aspects of the invention are described below.