Organizations often must manage a large volume of data. The data may be stored in data warehouses. These data warehouses may store up to terabytes of data. Accordingly, efficient mechanisms for managing the data are needed.
These data warehouses may be implemented as a database system. One type of database system is known as a relational database (RDB) or a relational database management system (RDBMS), which organizes data into tables. The tables may have rows that represent individual entries or records in the database, and the columns of the tables may define what is stored in each entry or record.
A database query is used to retrieve information from the database system. The database query typically includes one or more expressions interconnected with operators. The database query usually defines conditions and/or logical operates to be applied to at least one column of at least one table. The database is searched for records that satisfy the query, and the records are returned as a query result, also referred to simply as a “result” or a “dataset.” The database query is normally written in a query language. The query language defines a syntax for generating and processing queries that is independent of the actual structure and format of the database.
To process a database query, a database system typically parses the database query based on a grammar for the appropriate query language. The parsed database query may be stored as a tree representation called a parse tree. The parse tree may also be transformed into a relational algebra expression. The relational algebra expression may also be stored as a tree. The relational algebra expression may include a plurality of operators on sets of data. Common relational algebra operators include set operators, including set unions and set differences, natural joins, or other types of joins, antijoins, projections, selections, divisions, or the like. The relational algebra expression may be represented as a graph, a tree, or the like. The relational algebra expression (tree) may then be used to produce an execution plan, which describes particular steps to be taken by a computing device to produce the requested result. The execution plan may also be stored and represented as a tree. The nodes of the execution plan tree may include execution parts or operators, including the operators: scan, join, ordering, grouping, aggregation, sub-query, derived table/common table expressions, or the like. Some database systems may produce the execution plan directly from the parse tree. Other database systems may query-optimize the relational algebra tree and/or execution plan tree to produce more efficient usage of resources.
The relational algebra tree and/or execution plan tree may be query-optimized using a query optimizer. The query optimizer is a component of a database management system that determines an efficient way to execute a database query. The query optimizer may consider the costs (e.g., CPU, network, or disk access costs) of performing different execution plans for the same operation defined by the database query, or the like. Different execution plans may include different possible execution parts and/or ordering of the execution parts. The execution parts may include a variety of operates, including different access paths (e.g., index scan, sequential scan), join algorithms (e.g., sort-merge join, hash join, nested loops), or the like. The query optimizer may determine the execution plan that minimizes costs.
While a query optimizer and/or other components of a database system may make an individual database query more efficient, each individual database query may consume significant resources, especially when the database system must manage a large volume of data. Therefore, it is with respect to these considerations and others that the present invention has been made.