The term “database” refers to a collection of data that is multidimensional in the sense that internal links between its entries make the information accessible from a variety of perspectives. This is in contrast to a traditional file system, sometimes called a “flat file,” which is a one-dimensional storage system, meaning that it presents its information from a single point of view. Whereas a flat file containing information about composers and their compositions might provide a list of compositions arranged by composer, a database might present all the works by a single composer, all the composers who wrote a particular type of music, and perhaps the composers who wrote variations of another composer's work.
Abstraction can be used to hide internal complexities. Database management systems (DBMS) hide the complexities of a database's internal structure, allowing the user of the database to imagine that the information stored in the database is arranged in a more useful format. In particular, a DBMS contains routines that translate commands stated in terms of a conceptual view of the database into the actions required by the actual storage system. This conceptual view of the database is called a “database model.”
In the case of a relational database model, the conceptual view of the database is that of a collection of tables consisting of rows and columns. For example, information about a company's employees might be viewed as a table containing a row for each employee and columns labeled name, address, employee identification number, and so on. In turn, the DBMS would contain routines that allow the application software to select certain entries from a particular row of the table or perhaps to report the range of values found in the salary column—even though the information is not actually stored in rows and columns.
In a relational database model, data is portrayed as being stored in rectangular tables, called “relations,” which are similar to the format in which information is displayed by spreadsheet programs. A row in a relation is called a “tuple.” Tuples include information about a particular entity. Columns in a relation are referred to as “attributes” because each entry in a column describes some characteristic, or attribute, of the entity represented by the corresponding tuple.
A DBMS based on the relational model includes routines to perform select, project, and join operations, which can then be called from the application software. In this manner, the application software can be written as though the database were actually stored in the simple tabular form of the relational model. Today's relational database management systems do not necessarily provide routines to perform select, project, and join operations in their raw form. Instead, modern systems provide routines that might be combinations of these basic steps. An example is the language SQL (Structured Query Language, both proprietary forms and past and previous ANSI standard forms), which forms the backbone of most relational database query systems.
Queries stated in SQL are essentially declarative statements. SQL statements may be read as descriptions of information desired rather than as sequences of activities to be performed. The significance of this is that SQL relieves application programmers from the burden of developing algorithms for manipulating relations—the programmers may merely describe the information desired. Each SQL query statement can contain roughly three clauses: a “select” clause, a “from” clause, and a “where” clause. Roughly speaking, such a statement is a request for the result of forming the join of all the relations listed in the “from” clause, selecting those tuples that satisfy the conditions in the “where” clause, and then projecting those tuples listed in the “select” clause. In addition to statements for performing queries, SQL also encompasses statements for defining the structure of relations, creating relations, and modifying the contents of relations.
In some database management systems, the processing of a SQL statement is performed by a single server process. However, if a DBMS provides a parallel query feature, then multiple processes can work together simultaneously to process a single SQL statement. This capability is called parallel query processing. By dividing the work necessary to process a statement among multiple server processes, the DBMS can process the statement more quickly than if only a single server process processed that statement.
The parallel query feature can dramatically improve performance for data-intensive operations associated with decision support applications or very large database environments. Symmetric multiprocessing (SMP), clustered, or massively parallel systems gain the largest performance benefits from the parallel query feature because query processing can be effectively split up among many central processing units on a single system.
In some database management systems that provide a parallel query feature, the query is parallelized dynamically at execution time. Thus, if the distribution or location of the data changes, the DBMS automatically adapts to optimize the parallelization for each execution of a SQL statement.
Parallel query processing can be used to execute a variety of statements, including “select” statements; subqueries in “update,” “insert,” and “delete” statements; “create table . . . as select” statements; and “create index” statements.
Without the parallel query feature, a single server process performs all necessary processing for the execution of a SQL statement. For example, without this feature, to perform a full table scan, one process performs the entire operation. The parallel query feature allows certain operations (for example, full table scans or sorts) to be performed in parallel by multiple query server processes. One process, known as the query coordinator, dispatches the execution of a statement to several query servers and coordinates the results from all of the servers to send the results back to the user.
The query coordinator process is very similar to other server processes, but the query coordinator can break down execution functions into parallel pieces and then integrate the partial results produced by the query servers. Query servers get assigned to each operation in a SQL statement (for example, a table scan or a sort operation), and the number of query servers assigned to a single operation is the degree of parallelism for a query.
When a statement is parsed, an optimizer of the DBMS determines the execution plan of that statement and determines the parallelization method of the execution plan. Parallelization is the process by which the query optimizer determines which operations can be performed in parallel and how many query server processes to execute the execution plan.
To decide how to parallelize a statement, the query optimizer process decides how many query server processes to enlist. When making these decisions, the query optimizer uses information specified in hints of a query, the table's definition and statistics, and initialization parameters. Before enlisting query server processes, the query optimizer process examines the operations in the execution plan to determine whether the individual operations can be parallelized.
The query optimizer process also examines the partitioning requirements of each operation. An operation's partitioning requirement is the way in which the rows operated on by the operation must be divided, or partitioned, among the query server processes. The partitioning scheme might adopt range, hash, round robin, or random techniques.
After determining the partitioning requirement for each operation in the execution plan, the query optimizer determines the order in which the operations must be performed. With this information, the query optmizer determines the data flow of the statement. Operations that require the output of other operations are known as parent operations. For example, within a query, a “group by sort” operation will be the parent of a “merge join” operation within that same query if the “group by sort” operation requires the output of the “merge join” operation.
Parent operations can begin processing rows as soon as the child operations have produced rows for the parent operation to consume. While one set of query servers are producing rows in a child operation (e.g., “full scan”), another set of query servers can begin to perform a parent operation (e.g., “merge join”) to consume the rows. Each of the two operations performed concurrently is given its own set of query server processes. Therefore, both query operations and the data flow tree itself have degrees of parallelism. The degree of parallelism of an individual operation is called intra-operation parallelism and the degree of parallelism between operations in a data flow tree is called inter-operation parallelism. Due to the producer/consumer nature of the query operations, only two operations in a given tree need to be performed simultaneously to minimize execution time, so the maximum degree of inter-operation parallelism is 2. When a set of query servers completes its operation, the set moves on to operations higher in the data flow.
Inexpensive database management systems for personal use are relatively simple systems. They tend to have a single objective—to shield the user from the technical details of the database implementation. The databases maintained by these systems are relatively small and generally contain information whose loss or corruption would be inconvenient rather than disastrous. When a problem does arise, the user can usually correct the erroneous items directly or reload the database from a backup copy and manually make the modifications required to bring that copy up to date. This process might be inconvenient, but the cost of avoiding the inconvenience tends to be greater than the inconvenience itself. In any case, the inconvenience is restricted to only a few people, and any financial loss is generally limited.
In the case of large, multiuser, commercial database systems, however, the stakes are much higher. The cost of incorrect or lost data can be enormous and can have devastating consequences. In these environments, a major role of the DBMS is to maintain the database's integrity by guarding against problems such as operations that for some reason are only partially completed or different operations that might interact inadvertently to cause inaccurate information in the database.
A single transaction, such as the transfer of funds from one bank account to another, the cancellation of an airline reservation, or the registration of a student in a university course, might involve multiple steps at the database level. For example, a transfer of funds between bank accounts requires that the balance in one account be decremented and the balance in the other be incremented. Between such steps, the information in the database might be inconsistent. Indeed, funds are missing during the brief period after the first account has been decremented but before the other has been incremented. Likewise, when reassigning a passenger's seat on a flight, there might be an instant when the passenger has no seat or an instant when the passenger list appears to be one passenger greater than it actually is.
In the case of large databases that are subject to heavy transaction loads, it is highly likely that a random snapshot will find the database in the middle of some transaction. A request for the execution of a transaction, or an equipment malfunction, will therefore likely occur at a time when the database is in an inconsistent state.
In the case of a malfunction, the goal of the DBMS is to ensure that the problem will not freeze the database in an inconsistent state. This is often accomplished by maintaining a log containing a record of each transaction's activities in a nonvolatile storage system, such as a magnetic disk. Before a transaction is allowed to alter the database, the alteration to be performed is first recorded in the log. Thus, the log contains a permanent record of each transaction's actions.
The point at which all the steps in a transaction have been recorded in the log is called the “commit point.” It is at this point that the DBMS has the information the DBMS needs to reconstruct the transaction on the DBMS's own if that reconstruction should become necessary. At this point, the DBMS becomes committed to the transaction in the sense that the DBMS accepts the responsibility of guaranteeing that the transaction's activities will be reflected in the database. In the case of an equipment malfunction, the DBMS can use the information in its log to reconstruct the transactions that have been completed (committed) since the last backup was made. The DBMS can rerun, from their beginnings, queries within transactions that have not yet been committed.
Long-running queries are more likely to experience a fault and can be extremely costly to rerun. The DBMS might run such queries for hours or days before reaching a commit point. Users and/or applications that do not have workarounds to avoid rerunning a very long query are exposed to a failure between commit points.
Additionally, in a multi-node cluster DBMS, as the number of nodes executing a parallel statement increases, the probability of a node failure also increases. Restarting a query in response to the failure of a node in the cluster consumes the entire cluster for even longer, potentially delaying other workloads.
Furthermore, when the DBMS's workload includes routine extract, transform, and load (ETL) or maintenance workload that usually is processed with little human supervision, a single failure prevents the workload from being completed, in which case human intervention is needed to restart the workload, causing the workload to finish late and potentially delaying other workloads.
Additionally, some users may need to interrupt routine workloads with high-priority jobs. This may become more prevalent in a cloud-computing environment, including dedicated/internal clouds. Stopping a currently executing query in order to release its system resources for use by another query leads to loss of work.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.