Computer systems typically comprise a combination of computer programs and hardware, such as semiconductors, transistors, chips, circuit boards, storage devices, and processors. The computer programs are stored in the storage devices and are executed by the processors. Fundamentally, computer systems are used for the storage, manipulation, and analysis of data.
One mechanism for managing data is called a database management system (DBMS) or simply a database. Many different types of databases are known, but the most common is usually called a relational database, which organizes data in tables that have rows, which represent individual entries, tuples, or records in the database, and columns, fields, or attributes, which define what is stored in each entry, tuple, or record. Each table has a unique name or identifier within the database and each column has a unique name within the particular table. The database may also have one or more indexes, which are data structures that inform the DBMS of the location of a certain row in a table given an indexed column value, analogous to a book index informing the reader of the page on which a given word appears.
The most common way to retrieve data from a database is through statements called database queries, which may originate from user interfaces, application programs, or remote computer systems, such as clients or peers. A query is an expression evaluated by the DBMS, in order to retrieve data from the database that satisfies or meets the criteria or conditions specified in the query. Although the query requires the return of a particular data set in response, the method of query execution is typically not specified by the query. Thus, after the DBMS receives a query, the DBMS interprets the query and determines what internal steps are necessary to satisfy the query. These internal steps may comprise an identification of the table or tables specified in the query, the row or rows selected in the query, and other information such as whether to use an existing index, whether to build a temporary index, whether to use a temporary file to execute a sort, and/or the order in which the tables are to be joined together to satisfy the query. When taken together, these internal steps are referred to as a query plan (QP), a query execution plan (QEP), a query access plan (QAP), or an access plan (AP). The DBMS often saves the query plan and reuses it when the user or requesting program repeats the query, which is a common occurrence, instead of undergoing the time-consuming process of recreating the query plan.
The DBMS may create many different access plans for any one query, each of which returns the required data set, yet the different access plans may provide widely different performance. Thus, especially for large databases, the access plan selected by the DBMS needs to provide the required data at a reasonable cost, in terms of time and hardware resources. Hence, the DBMS often creates multiple prospective access plans and then chooses the best, or least expensive one, to execute.
One factor that contributes to the cost of a particular access plan is the number of rows that a query using that access plan returns from a database table. A query that returns a large number of rows may run most efficiently with one access plan, while a query that returns only a small number of rows may run most efficiently with a different access plan. Hence, in an attempt to choose the best access plan for a particular query, current query optimizers estimate the number of rows that the query will return when executed based on the number of unique values in a column of the table to which the query is directed. This number of unique values is called the cardinality of the column.
One type of query is called a recursive query, which returns rows that have relationships to an arbitrary depth in a table, which provides an easy way of traversing tables that represent tree or graph data structures. For example, given a table that represents the reporting relationships within a company, a recursive query may return all workers that report, directly or indirectly, to one particular person. Recursive queries typically contain an initial sub-query, a seed, and a recursive sub-query that, during each iteration, appends additional rows to the result set. An example of a recursive query is the SQL (structured query language) recursive common table expression (RCTE).