The development of the EDVAC computer system of 1948 is often cited as the beginning of the computer era. Since that time, computer systems have evolved into extremely sophisticated devices, and computer systems may be found in many different settings. Computer systems typically include a combination of hardware (such as semiconductors, integrated circuits, programmable logic devices, programmable gate arrays, and circuit boards) and software, also known as computer programs. Computer systems also typically include digital storage devices, which store the software, such as an operating system and applications, and data.
One mechanism for managing the data is called a database management system (DBMS), which may also be called a database system or simply a database. Many different types of databases are known, but the most common is usually called a relational database (RDB), which organizes data in tables that have rows, which represent individual entries or records in the database, and columns, which define what is stored in each entry or record. Each table has a unique name within the database and each column has a unique name within the particular table. Specific columns within each table can be defined as key columns. Each unique combination of values in the aggregate of the key columns in a particular data row uniquely identifies that data row in the database. If two or more rows of data are entered such that the key columns contain the same values, then only the last data row entered remains in the database. The database also has an index, which is a data structure that informs the database management system of the location of a certain row in a table given an indexed column value, analogous to a book index informing the reader on which page a given word appears.
As an aid to maintaining integrity of databases, a component often known as a journal file system keeps a current record, or journal (also known as a journal receiver or change log), of changes to the data. In general, a journal file system provides three primary areas of support: (1) recording changes to data objects, (2) single system recovery, and (3) recovery of a saved data object to a known state. These areas are discussed below.
In a recording of changes to data objects, object changes are recorded as journal entries in a journal receiver. The journal receiver is a file object that contains journal entries added by the journal system when data objects are modified. The journal entries may then be used for recovery from an abnormal system termination. Another use for the recorded changes is for replicating entries from the journal receiver to a back-up system so that they can be retrieved to create and maintain a replica of the source file system.
The journal can be used to recover the database to a known state following an abnormal system termination. The recovery process often occurs during an IPL (Initial Program Load) following the abnormal system termination. The journal receiver serves as a basis for all changes to objects that are implemented by an IPL. The IPL then processes object changes as if the abnormal system termination had not occurred by using the data contained in the journal receiver log that was created before the system termination. Damaged objects, caused by system functions that were interrupted during their critical operations, are discarded.
When recovering a saved object to a known state, the object is recovered (or rolled back) to a state of its last saved operation occurring sometime prior to the operation that caused the object to become corrupted or enter an invalid or incorrect state. Then, objects are recovered to some later point in time by applying the journaled changes that were recorded in the journal receiver.
Although the aforementioned journal file system techniques have worked well for conventional data, new applications are being developed that imbed complex logic within data, which yields a complex data structure. The data and logic are often so complex that a given operation may complete and satisfy the integrity rules of the database, yet some aspect of the complex data structure is still incorrect or in an invalid state. Typically, an invalid state involves multiple operations that could be scattered throughout the database. Individually, each operation might be correct, but taken collectively, the operations produce an incorrect result or invalid state of the data. When this happens, rolling back the invalid group of operations is necessary.
Unfortunately, the incorrect result is usually not detected immediately, so that additional database operations will usually occur after the incorrect data is entered to the database, but before the problem is detected. Using current time-based journaling and rollback implementations, the newer updates, which although they may be correct, still need to be removed in order to restore the database to a correct state prior to the error. Then, the valid input must be determined and reentered. In addition to the problem of the redo effort, the result can be a confusing situation where completed and validated activities are undone and must be redone. This confusing situation creates new opportunities for error and may force retesting of previously validated data content and may require coordinating across what can often be different data entry and validation teams. With recent advancements in data propagation, these problems are further complicated when the erroneous data has been propagated to multiple downstream databases. By the time the problem is detected many databases can be impacted.
Thus, a better technique is needed for handling changes to databases.