In the latter half of the twentieth century, there began a phenomenon known as the information revolution. While the information revolution is a historical development broader in scope than any one event or machine, no single device has come to represent the information revolution more than the digital electronic computer. The development of computer systems has surely been a revolution. Each year, computer systems grow faster, store more data, and provide more applications to their users.
Modern computer systems may be used to support a variety of applications, but one common use is the maintenance of large relational databases, from which information may be obtained. A large relational database is often accessible to multiple users via a network, any one of whom may query the database for information and/or update data in the database.
Conceptually, a relational database may be viewed as one or more tables of information, each table having a large number of entries or records, also called “tuples” (analogous to rows of a table), each entry having multiple respective data fields (analogous to columns of the table) with a defined meaning. To access information, a query is run against the database to find all rows for which the data in the columns of the row matches some set of parameters defined by the query. A query may be as simple as matching a single column field to a specified value, but is often far more complex, involving multiple field values and logical conditions. A query may also involve multiple tables (referred to as a “join” query), in which the query finds all sets of N rows, one row from each respective one of N tables joined by the query, where the data from the columns of the N rows matches some set of query parameters. Found records may be updated by altering the values of one or more fields, or records may be deleted or added.
To support database queries, large databases typically include a query engine which executes the queries according to some automatically selected search (execution) strategy, and may include one or more metadata structures which characterize the data in the database table(s). Examples of metadata structures are indexes, materialized query tables, and histograms, it being understood that these examples are not necessarily exhaustive. Metadata structures may be used by the database query engine to determine an optimal query strategy for executing a query against the database.
When a record in a database is updated, deleted or added, the corresponding database table is updated, and it may further be necessary to update one or more metadata structures to reflect the change being made to the underlying data. Large databases may be accessible by many users concurrently, each of whom may be making changes to the data. The burden of processing and recording these changes can be significant.
For performance reasons, large databases typically record changes in a sequential database log, also called a journal. A sequential log of transactions can be written to non-volatile storage, such as a hard disk drive, much more quickly than a corresponding set of transactions can be written to scattered individual non-volatile storage locations of the database table(s) and metadata. The log enables recovery of database data to reconstruct the database to a consistent state in the event of a system and/or network failure (either temporary or permanent) which causes loss or unavailability of volatile data. I.e., in the event of a system/network failure, even if some transactions have not been written to the database tables and other structures in non-volatile storage, it is possible to reconstruct the database state by parsing the transactions in the log and updating the data accordingly.
Although conventional database logs make it possible to reconstruct the database, they do not necessarily make it easy. The log is read back to a checkpoint, and entries in the log are redone (or in some cases, undone), by reading in affected pages of database tables and metadata, modifying the tables/metadata accordingly, and writing them out. Depending on the number of entries in the log and other factors, this can take considerable time, during which the database may be unavailable to users who wish to access it.
A need exists for improved techniques for managing relational databases, and in particular, for improved techniques which reduce unavailability of a database and/or resources required to reconstruct the database in the event of a system/network failure.