The volume of documents in databases is rapidly expanding. It has been estimated that in excess of 90% of all desired intelligence information is available in the records of accessible databases. In order for the information in databases to be useful, a user must be able to locate and modify records within a database. Operating on the records of a database typically involves a computer system that enables one or more users of the database to add, change, read, delete and/or otherwise manipulate records within the database. In order to allow this manipulation of the database records and maintain the integrity of the records in the database, the computer system must keep precise control over how and when users have access to the database.
The computer system that operates on the database generally comprises a central processing unit(CPU), main memory and disk storage. The CPU interacts directly with main memory and indirectly with disk storage through the main memory. The main memory is much faster at supplying information to the CPU than is the disk storage. However, the disk storage has much more capacity for storing information than main memory. Since the amount of information stored in databases is significantly larger than the capacity of the main memory, and since the data can be permanently stored in disk storage, the database records are maintained in disk storage.
Data is manipulated within the database through transactions. A transaction is a group of modifications to the database such that all of the modifications occur or none of them occur. That is, a transaction has the property of atomicity. A transaction basically consists of two phases. In a first phase, the transaction starts and the desired modifications to the database are assembled. During this first phase of the transaction, write operations other than the current transaction are not allowed on the database. In a second phase of the transaction, the modifications to the database are committed, i.e., the group of assembled modifications to the database are actually written to the records in disk storage. During the second or "commit" phase of a transaction, both read and write operations are not normally allowed. This is because, while changes to the database records are being made, the state of the database is not precisely known, and so read operations might receive inconsistent data.
Since the size of databases is large, the memory space required to merely assemble the group of modifications for a transaction is typically larger than the main memory of the computer. Also, once a transaction is in the commit phase, it must be completed (even if power to the computer is lost) or the state of the database will not be certain. As a result, when the modifications of a transaction are assembled in the first phase, they are written to a file in disk storage. In this way, if power is lost during the commit phase, no data will be lost and the commit process can be completed when power is returned.
In order to write the transaction modifications to disk storage, there must be space in disk storage for the file in which the changes are written. Generally, there are two options concerning where to write these changes. First, the modifications and their corresponding disk storage addresses could be written to a separate transaction log in disk storage during the first phase of the transaction. In this technique, each of the modifications are written to their corresponding disk storage addresses (over the old modified data) during the commit phase. Readers have direct access to the database during the first phase of this type of transaction because the original database records on disk storage remain unchanged until the commit phase. Once the modifications have been written to the database records in disk storage, the transaction log is discarded. In this type of transaction system the memory space taken up in disk storage is no more than necessary because only the modifications and their corresponding addresses are stored in the log. However, as noted above, readers do not have access to the database during the commit phase. This is a problem because the number of modifications to a database can be large, and as a result readers are locked out of the database for unacceptably long periods of time during the commit phase.
The second option for storing the transaction modifications on disk storage is to write them in the same files as the original database records during the first phase. As a practical matter, the memory size required for this technique is much larger than the memory size for a transaction log. Also, this technique requires the use of a de-reference table for all database access operations. The de-reference table (which is part of the transaction log) translates the addresses of the original database records to the addresses for the changed database records. Since the de-reference table must be used on all accesses to the database, the time required to access the database increases. As a result, neither of the conventional options for writing modifications to disk storage is satisfactory because one option locks out database users for long periods of time and the other requires reserving too much memory space in disk storage and causes access operations to be slower.