The present invention relates generally to decorrelation techniques used in a relational database management system, and more specifically, to extending a decorrelation technique for a scalar subquery to a join in a relational database management system.
Database management systems (DBMSs) are used to organize and manage large amounts of information. The data stored in databases is normally structured into records with predetermined fields. These fields identify the information in the records, and are normally organized into tables having rows and columns such that a query may be executed by a DBMS and the DBMS may generate a query response having query-satisfying information retrieved from the row(s) and column(s) associated with the tables.
A DBMS is an executable program stored on a data processing system. As is known to those skilled in the art, such a data processing system may include different hardware and software combinations. Users may access tabled information stored within a database which is operatively coupled to the DBMS by way of a user interface using, for example, a structured query language (SQL) or an XQuery and the like.
A given query or statement may be parsed and compiled by a compiler contained within the DBMS, and as a result of compiling the given query the DBMS generates executable code, which may be used for retrieving query-satisfying data, which may satisfy the requirements of the given query or statement. Typically, the DBMS may include a user command processor for processing user commands, such as queries, and executing such user commands against the database. Data processing system usable medium may contain executable code for directing the DBMS to perform algorithms related to operations of the DBMS. The data processing system usable medium may also store the database.
One way to manipulate and access a data collection stored within the database is to use a query, such as an SQL query. SQL queries may be of varying structure and complexity. Many such queries however, are often structured to utilize query predicates, which cannot be evaluated until execution of the SQL query at runtime. A query predicate is generally known as an element of a search value that expresses or implies a comparison operation. In effect, a predicate specifies a condition about a row of data or group of data to be manipulated by a SQL query.
In relational database systems, decorrelation techniques have become commonplace, and are in wide use in the industry. These techniques allow for removal of dependencies in processing order and open up a wider set of access plan and optimization choices. In columnar database systems, such techniques have become even more important due to lack of indexes and requirements faster data access.
A correlated subquery is a subquery for which processing depends on the data values from other parts of a SQL statement outside a scope of the subquery, referred to as an outer query block. The subquery is evaluated for each row of the outer query block.
Decorrelation is a process used to remove a dependency on the outer query, which allows for a large set of data to be read and processed and typically leads to substantially higher performance than without the application of decorrelation.
A scalar subquery is a subquery that is guaranteed to produce a single data value. Subqueries and/or scalar subqueries may be present in either a predicate context or an output expression context. When a result of more than one value is generated then a SQL error must be produced.
A typical approach to optimization of correlated SQL queries in a relational database management system implements decorrelation to achieve faster query processing. However, a requirement of the implementations uses an extra set of joins, which as a result, have performance and optimization limitations because the data typically needs to be read twice or preserved in a temp table.
In another typical approach, involving the transforming of correlated scalar subqueries there is a requirement for the presence of a uniqueness constraint, for example, a primary key, or a unique key, to be able to prove in advance that the output of the subquery will be a singleton. This requirement is a constraint that imposes significant limitations for the applicability of prior work to a relatively small set of SQL queries.
There is a need to overcome the drawbacks of the aforementioned optimization techniques.