1. Field of the Invention
This invention relates in general to computer-implemented database systems, and, in particular, to accumulating changes in a database management system.
2. Description of Related Art
Databases are computerized information storage and retrieval systems. A Relational Database Management System (RDBMS) is a database management system (DBMS) which uses relational techniques for storing and retrieving data. Relational databases are organized into tables which consist of rows and columns of data. The rows are formally called tuples. A database will typically have many tables and each table will typically have multiple tuples and multiple columns. The tables are typically stored on direct access storage devices (DASD) such as magnetic or optical disk drives for semi-permanent storage.
A table is assigned to a tablespace. The tablespace contains one or more datasets. In this way, the data from a table is assigned to physical storage on DASD. Each tablespace is physically divided into equal units called pages. The size of the tablespace's pages is based on the page size of the bufferpool specified in the tablespace's creation statement. The bufferpool is an area of virtual storage that is used to store data temporarily. A tablespace can be partitioned, in which case a table may be divided among the tablespace's partitions, with each partition stored as a separate dataset. Partitions are typically used for very large tables.
A table may have an index. An index is an ordered set of pointers to the data in the table. There is one physical order to the rows in a table that is determined by the RDBMS software, and not by a user. Therefore, it may be difficult to locate a particular row in a table by scanning the table. A user creates an index on a table, and the index is based on one or more columns of the table. A partitioned table must have at least one index. The index is called the partition index and is used to define the scope of each partition and thereby assign rows of the table to their respective partitions. The partition indexes are created in addition to, rather than in place of, a table index. An index may be created as UNIQUE so that two rows can not be inserted into a table if doing so would result in two of the same index values. Also, an index may be created as a CLUSTERING index, in which case the index physically stores the rows in order according to the values in the columns specified as the clustering index (i.e., ascending or descending, as specified by the user).
RDBMS software using a Structured Query Language (SQL) interface is well known in the art. The SQL interface has evolved into a standard language for RDBMS software and has been adopted as such by both the American National Standards Institute (ANSI) and the International Standards Organization (ISO). The SQL interface allows users to formulate relational operations on the tables interactively, in batch files, or embedded in host languages, such as C and COBOL. SQL allows the user to manipulate the data. As the data is being modified, all operations on the data are logged in a log file.
One technique for recovering a database involves restoring a prior full image copy of the data and then reapplying subsequent logged changes to make the data current in time. Typically, the database containing partitions and indexes is stored on a data storage device, called a primary data storage device. The partitions are periodically copied to another data storage device, called a secondary data storage device, for recovery purposes. In particular, the partitions stored on the primary data storage device may be corrupted, for example, due to a system failure during a flood, or a user may want to remove modifications to the data (i.e., back out the changes). In either case, for recovery, the partitions are typically copied from the secondary data storage device to the primary data storage device. Next, using the log file, the copied data is modified based on the operations in the log file. Then, the indexes are rebuilt. In particular, to rebuild the indexes, keys are copied from each row of each partition, sorted, and then used to create a partition index. Additionally, the table index is rebuilt based on the partition indexes.
Another technique for recovering a database involves restoring the database using a prior full image copy, restoring one or more partial image copies (sometimes called incremental copies), and then reapplying subsequent logged changes to make the data current in time. The partial copies contain accumulated changes made to the data since the previous full or partial image copy operation. In some systems, the changed data is identified using indicators (i.e., usually called "dirty" bits or "status" bits) associated with each record or block of records (i.e., sometimes called a "page" of records) to designate that a change has occurred to a record or block. Whenever the record or block is first modified, the indicator is set. Each time the record or block is placed in an image copy, the indicator is reset. However, the overhead of maintaining these indicators is significant. With high transaction loads in a data sharing complex, providing coherency of these indicators across the records and blocks of the complex can result in degraded transaction and system performance.
These techniques for recovery of data are very costly in terms of performance. Additionally, users are not able to access data while recovery is taking place. For a user or company requiring the use of computers to do business, much money can be lost during recovery. Therefore, it is important to improve the efficiency of the recovery process, and there is a need in the art for an improved recovery technique without the need for changed data indicators (i.e., "status" bits).