A database is a collection of information. A relational database is a database that is perceived by its users as a collection of tables. Each table arranges items and attributes of the items in rows and columns respectively. Each table row corresponds to an item (also referred to as a record or tuple), and each table column corresponds to an attribute of the item (referred to as a field, an attribute type, or field type). To retrieve information from a database, the user of a database system constructs a query. A query contains one or more operations that specify information to retrieve, manipulate, or update from the database. The system scans tables in the database and processes the information retrieved from the tables to execute the query.
A database system can optimize a query by arranging the order of query operations. There may be various conditions that make it difficult to analyze the progress or estimate the completion time of complex database queries, particularly on a parallel database system. One condition may be that different portions of a query plan may be tasked with completing different amounts of work. For example, a query might include a file scan operator that sequentially reads tuples from a table and selects only a relatively small portion of these tuples to be processed by a subsequent sort operator. Also, different portions of a query plan may process tuples and utilize resources at different rates. For example, a sort operator that writes external runs to disk will process fewer tuples per second and than an operator that performs a sequential scan of an in-memory index.
Even a given operator's resource usage and tuple processing rates may vary depending upon resource availability and on the volume of data to be processed. For example, a sort operator will process fewer tuples per second and incur more I/O operations when it needs to write external runs to disk than it would if the data to be sorted fit into memory. Progress may also be difficult to analyze if the database is a parallel database. Then multiple instances of some, but not all, of the system operators may execute simultaneously. For example, a file scan of a table that is partitioned across multiple nodes may execute in parallel across those nodes, but an operator that selects the first ten tuples returned by sort operation may consist of a single instance that must wait until the sort has completed. Combined, these factors make it difficult to model the interactions of all the operators that comprise a complex query.