In a computerized database system, data is stored electronically in non-volatile storage devices such as direct access storage devices (DASDs) also referred to as disks. The DASD is associated with a data processing system comprising a central processing unit (CPU) and volatile random access memory (RAM). Database management system (DBMS) software programs are executed by the CPU to logically organize and control access to the data stored in the storage device. The data is accessed using application programs or query languages such as the structured query language (SQL).
A unit of recovery refers to an application or transaction that updates data. Updating transactions replace data in the database with a new value, delete data from the database and add new data to the database.
A single transaction typically accesses multiple pieces of data for reading and for updating. Transactions that retrieve data based on the satisfaction of query criteria read multiple pieces of data during the process of identifying data that satisfies the query. Transactions will also frequently update multiple pieces of data during a transaction. For example, a transaction may update information for all employees in a department which effects multiple pieces of data.
The progress of a transaction is tracked until the transaction is "committed". A transaction is committed when all data accessed and modified by the transaction is in a consistent state. All updates made by a transaction are either completed (committed) or aborted.
The data is stored in pages (or other forms of organization) on DASD. When a particular piece or row of data is accessed (for reading or updating), the corresponding page is read into a buffer in the system's faster volatile random access memory. The buffers are stored in a pool in memory. Periodically the changed data pages in the buffer pool are read back into the non-volatile storage.
Application programs and SQL queries typically update many pieces of data on many different pages during a single transaction requiring many different pages to be read into a buffer pool. The pages are read back and forth between the buffer pool and the DASD in order to provide room for all of the pages needed to be updated. Therefore, pages with updated information from uncommitted transactions are written to the DASD while reclaiming buffer pool space.
When a DBMS terminates abnormally prior to completing all work, the transactions that were processing during the failure and only partially executed may have caused the data to be in an inconsistent state. Therefore, during recovery processing, the data in the system may need to be restored to a consistent state.
System failures are caused by hardware problems, software problems and power outages. For enterprises that rely on computerized database systems, a system that has had a software or hardware failure needs to be restored and returned to a consistent state as quickly as possible.
There are two types of data errors that can occur, logical and physical errors. Logical errors concern updates that were not properly recorded and are caused by system failures or errors in the applications programs. Physical errors are caused by hardware or media malfunctions such as disk reading or writing errors.
As is well known by those skilled in the art, a DBMS uses a linear sequential log to record all modifications made to the data. The recovery process involves applying the log records to the version of the database stored on the DASD.
Each log record has associated with it a relative byte address (RBA) which is the offset of the log record from the beginning of the log. The RBA never repeats and represents a point in time when a modification was recorded.
The log records provide delimiters for identifying updates affecting the data object. The log records for a data object do not correspond to a single transaction since multiple transactions can be accessing the same data. The log is always written to DASD prior to the page being written to the DASD.
Each system, when updating data, assigns a log sequence number for the page. The log sequence number is the RBA of the log record in the log file for the update. In the event of system failure, the page is read from the disk. The transactions that have occurred after the page was written to disk are identified based on the log sequence number. The page's state is brought up to date by applying the log records whose RBA is greater than the log sequence number associated with the page.
At periodic intervals, a DBMS performs a "checkpoint" operation. The system status information, including information regarding active transactions, is recorded on the log. Following a system failure, the system is restarted from the last checkpoint. The database checkpoint log records are normally written only for those databases open for update transactions.
The frequency of checkpoints is determined by the amount of information written in the log. The frequency of checkpoints has an effect on the amount of time required to perform restart and recovery operations since checkpoints reduce the quantity of log records that must be processed. However, there are time and system constraints when performing checkpoints. Therefore, there is a need to minimize the overhead associated with performing checkpoints.
Two types of image copies of the database are maintained for recovery purposes, full image copies and incremental image copies. Full image copies are copies of the database objects. Incremental copies are copies of pages that have been updated since the full image copy was made. The RBA of the log at the time the full image copy was made is the starting point for applying log records pertaining to the data object.
In the prior art, as disclosed by Crus, R. A., "Data Recovery in IBM Database 2", IBM System Journal, Vol. 23, No. 2, 1984, in order to minimize the amount of log records that must be saved during the media recovery process, the DBMS tracks the log RBA of the first update to the data object and when the database is closed. The information is stored in a directory of stop and start RBA values. During media recovery, the start/stop RBAs associated with the object being recovered are read from the directory and only the log records within the RBA range defined by the stop/start directory is processed.
In a preferred embodiment, the log range information is stored in a system database (called SYSLGRNG) and used during a database recovery operation to determine areas in the log that need to be processed.
The log range is started for a database when switched from Read Only ("R/O") to Read Write ("R/W") state on the first update operation. Read Only signifies that the transactions accessing the database are only reading the data in the database and are not updating the database. "Read Write" signifies that the transactions accessing the database are both reading the database and writing updates to the database.
More specifically, a begin log range value is set to the current end of the log's RBA before the first update log record is written. At the time that a log range entry is created, an end log range value is set to "0" which indicates that the ending log range is unknown. In the prior art, the end log range value is updated when the database lock is released by all referenced transactions (including R/O transactions) or until the DBMS is terminated.
The DBMS uses locks to control concurrent access to data by different transactions. Usually, the database lock is released at transaction commit. For DBMSs that support hierarchical locking, the database lock is always acquired by a transaction before it can access/update the database. The database lock mode denotes whether or not "page" locks (a narrower scope lock) will be acquired while accessing data within the database.
For performance reasons, a database, once opened (updated), is not closed when it is no longer in use (i.e., no locks held on the database). In D. J. Haderle, J. Z. Teng, and A. Y. Yang, "Apply Slow Close Process to Minimize Physical Open/Close activities for Database Datasets", IBM Technical Disclosure Bulletin, March 1990, pages 405-407, a deferred close mechanism is disclosed that improves DBMS performance by minimizing database open/close activities. In such a system, the database remains open until all transactions that are accessing the database, even to only read the database, are completed. This results in a long log range for recovering the database.
The scheme of maintaining the SYSLGRNG entries as proposed in the prior art has a number of disadvantages. For databases that have frequent read access with occasional update activities, the SYSLGRNG entries will not be closed until the last read only transaction is committed and the end log range value is set to the current log RBA value. There can be a lot of log records in the log range between the last written log for this database and the current end of log RBA value if there were heavy update activities by other concurrent transactions.
For transactions that are performance sensitive, the database locks are normally not released until the transactions are terminated. By associating the close of the SYSLGRNG entries with the status of the database locks, this potentially defeats the purpose of maintaining the SYSLGRNG entries.
For databases that become in use and not in use frequently, the SYSLGRNG database will grow more rapidly if the status of the database locks is used to determine when to close the SYSLGRNG entries. Frequent create/update of the SYSLGRNG entries will have adverse effects on the system performance as well as degrading transaction response time.
In a multi-database system, multiple systems share the same data stored in databases. When multiple transactions access data in the same database there is even more of a danger that the data will become inconsistent. When a transaction is updating data in a database and another transaction tries to read that same data, there is a need to ensure the consistency of data for the transactions. Locking mechanisms are used to limit access to database.
DBMSs that support dynamic recognition of an environment where databases can be updated by more than one system (a multi-system data sharing environment) have a lot of overhead from maintaining consistent data. There are advantages in marking a system as read only (R/O) so that the data can be shared with less overhead.
One or more of the foregoing problems are overcome by the present invention.