Database management systems (DBMSs) are used to organize and manage large amounts of information. The data stored in databases is normally structured into records with predetermined fields. These fields identify the information in the records, and are normally organized into tables having rows and columns such that a query may be executed by a DBMS and the DBMS may generate a query response having query-satisfying information retrieved from the row(s) and column(s) associated with the tables.
A DBMS is an executable program stored on a data processing system. As is known to those skilled in the art, such a data processing system may include different hardware and software combinations. Users may access tabled information stored within a database which is operatively coupled to the DBMS by way of a user interface using, for example, a structured query language (SQL) or an XQuery and the like.
A given query may be parsed and compiled by a compiler contained within the DBMS, and as a result of compiling the given query the DBMS generates executable code which may be used for retrieving query-satisfying data which may satisfy the requirements of the given query. Typically, the DBMS may include a user command processor for processing user commands, such as queries, and executing such user commands against the database. Data processing system usable medium may contain executable code for directing the DBMS to perform algorithms related to operations of the DBMS. The data processing system usable medium may also store the database.
One way to manipulate and access a data collection stored within the database is to use a query, such as an SQL query. SQL queries may be of varying structure and complexity. Many such queries however, are often structured to utilize query predicates which can not be evaluated until execution of the SQL query at runtime. A query predicate is generally known as an element of a search value that expresses or implies a comparison operation. In effect, a predicate specifies a condition about a row of data or group of data to be manipulated by an SQL query.
With SQL queries, a UNION operator and/or a UNION ALL operator specifies which data is to be retrieved from multiple independent sub-queries and presents a consistent set of columns and data-types to a parent operation of the SQL query. One usage of the UNION operator and/or the UNION ALL operator is to combine together a large data set that has been divided into multiple smaller tables for a number of reasons including: limitations in the amount of data that can be stored in a single table; enhancing performance through reduced lock contention, disk performance, index utility and others; combining results from statistical calculations; and others. The UNION operator and/or the UNION ALL operator may be used in the query to allow the parent operation to operate on a singular data object (i.e., a base table) even though data from multiple sub-queries and data objects may be retrieved.
Operators in a query language typically have one or more sources of input. When the UNION operator and/or the UNION ALL operator is present in a query, it is typically one of the inputs to at least one valid operator in the query language. We refer to each of those operators which receive input from the UNION operator and/or the UNION ALL operator as the parent operation. The specifics of the query language will determine what operators are valid as parent operators and what operation the parent operators will perform.
The nature of the UNION operator and/or the UNION ALL operator hides many properties of the underlying sub-queries from the parent operations including order, indexing, functional dependencies and column properties. Since these properties are unavailable to the parent operation, this lack of information may prevent the generation of an optimal access plan and may result in unnecessary work by the DBMS.
A known method of directing the DBMS to process the UNION operator and/or the UNION ALL operator contained in the SQL query is to process each of their sub-queries to produce their individual query results, and then combine those query results in a manner dictated by the UNION operator and/or the UNION ALL operator, and then flow the combined result to the parent operation. This approach may be problematic because of query optimization taking an extended period of time due to the many sub-queries of the UNION operator and/or UNION ALL operator and poor cache performance due to the size of the access plan.
Another known method of processing such operators is implemented outside of the DBMS by a controlling application in which the controlling application determines which data objects need to be operated thereon and instructing the DBMS on which data objects to access and in which manner. This approach requires a great deal of complexity in the controlling application which makes the controlling application more difficult to develop and maintain. It also defeats the purpose of the UNION operator and/or the UNION ALL operator, which hides the underlying structure from the controlling application, since the controlling application must now know the specification of the database design associated with the database.
Another known method to improve performance of queries against the UNION ALL operators is discussed a paper entitled “Partitioning in DB2 using UNION ALL View”, February 2002, DB2 Developer Domain. However, this paper does not address the query optimization time or access plan size problems identified above.
There is a need for a data processing system, a data processing system implemented method and an article of manufacture for executing a query having a UNION operator.