A database comprises data and metadata that are stored on one or more storage devices, such as a set of hard disks. The data within a database may be logically organized according to a variety of data models, depending on the implementation. For example, relational database systems typically store data in a set of tables, where each table is organized into a set of rows and columns. In most cases, each row represents a distinct object, and each column represents a distinct attribute. However, other data models may also be used to organize the data.
A database management system (DBMS) is software that controls access to data in a database. The DBMS is configured to receive and process a variety of database commands, often referred to as queries. In many implementations, the DBMS supports queries that conform to a structured query language (SQL). SQL is a standardized query language for managing data in a relational DBMS (RDBMS).
SQL includes two distinct sets of commands: Data Definition Language (DDL) for managing and indexing data structures in the database; and Data Manipulation Language (DML) for accessing and manipulating data residing within the data structures. DDL is typically used to create, alter, and delete database objects, such as tables, indexes, views, and constraints, whereas DML is typically used to add, update, and delete data in existing database objects.
When the DBMS receives a query, such as a SQL expression, the DBMS evaluates the query to obtain a query result. Query evaluation includes two stages: query compilation and query execution. During query compilation, the DBMS parses the SQL expression and generates a query execution plan. The query execution plan specifies an ordered set of steps, frequently represented as a tree of query operators, used to execute the query. The query operators at each step are associated with one or more expressions that represent computations or other actions that will be performed upon query execution. Example SQL operators include, without limitation, table scans, joins, table queues, group-by operations, and bloom filters. Once the query execution plan is generated, the query is executed according to the plan.
In one approach to query evaluation, the expressions of each query operator are independently and separately evaluated. For example, if a query execution plan includes multiple hash join operators, each hash join operator independently computes a hash value using an internal hash function. This approach provides a simple and straightforward way to execute a query. However, this approach can lead to inefficient use of processing and memory resources when computationally expensive operations are redundantly computed. In the preceding example, for instance, where the query execution plan includes multiple hash joins, some or all of the operators may use the same join key. Evaluating each hash join independently leads to redundant computation of a hash value from the same join key. That is, the DBMS will compute the hash value for each corresponding hash join in the query execution plan. As hashing is a computationally intensive task, the execution overhead can become significant when the number of hash computations required by an execution plan is large. In a similar manner, redundant evaluation of other expressions may also negatively impact query execution overhead.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.