1. Field of the Invention
This invention is related to computers and computer systems, and in particular to database management systems (DBMS).
2. Description of the Related Art
A database management system keeps track of changes made to a database during its lifetime. These changes are called "transactions" and each transaction is made up of operations such as insert, update, and delete. The operations are recorded in a "log file" in the order in which the operations are performed in the system. This log file may contain entries of operations performed from the time the database was created or from the time the database was last backed up until the present. The content and format of log entries depend on the design of the database system. Logically, each entry can be thought of as at least having (1) a transaction ID; (2) a unique record ID ("UID");and (3) an operation description (such as "insert," "update," etc.). A UID is any ID the logging subsystem uses to uniquely identify a database record. Many database systems also have an internal record ID that the DBMS uses for efficiency purposes. These record IDs are often the addresses of records in some level of storage abstraction. The UID need not be the same and is often different from the record ID in the system. One example of a UID is to use a combination of the primary key-value and the table ID of the record.
With the increasing emphasis in the database community on historical data, online analysis processing (OLAP), data warehousing, and data marts, it is crucial to propagate transactions performed on or changes made to an operation site database system (called the "operation site" or "local site") to another database system (called the "warehouse site" or "remote site") where analysis and decision support computation are performed. Other purposes, such as archiving or history analysis, may also require the propagation of changes in the operation site to the warehouse site. In these scenarios, the changes made to the database are captured through the log files of the operation site before the change-information is propagated to the warehouse site.
To propagate changes made on an operation site, it is first necessary to reconstruct a coherent view of the changes from the information available in its log files. This reconstruction results in an "equivalent global log file" which can apply the changes to remote copies of the same database. A log file that is "equivalent" to log file L is a sequence of log entries that, after applying the operations of the sequence to a copy of the database in the starting state of L, will bring the copy to the ending state of L.
For serial database systems, the order of operations can easily be inferred by examining the log files for each transaction; creating a global log file in that case is trivial. However, in parallel or distributed database systems, which consist of nodes each of which performs a part of a transaction, creating a global log file is much more complex. In such systems, after all the nodes have finished processing their parts of the transaction, the results are put together to complete the transaction. Each node has its own log file associated with it, often called a "local log file," in which all of the operations performed by the node are recorded in the order completed. On a "share-nothing" model of a parallel or distributed database system, each node operates on a partition of the database; the problem of constructing a global transaction order is especially complex in this case because any data item in the database belongs to exactly one partition and thus to exactly one node. FIG. 1 depicts the architecture of a share-nothing parallel database system in which the network 9 is made up of processing nodes 1,2,3,4 each respectively associated with partitions 5,6,7,8. Parallel and distributed database systems are designed in such a way that application programs and end users need not be aware of the parallel or distributed architectures and data partitioning in the systems.
Reconstructing parallel or distributed database system log files into a global equivalent log file requires merging the entries of the local log files of all nodes in the system. If information for a total order of all log entries is recorded in the log files, the problem is trivial. However, this information is often unavailable. Modifying the logging subsystem to record this information in the log files may not be feasible because of efficiency, cost, backward compatibility, and other practical reasons. Even when this ordering information is available, determining the actual order of log entries from the information can be expensive when the number of logs is large.
The problem can be broken down into two parts--ordering of transactions and ordering of entries within each transaction. If two log entries belong to different transactions, they are arranged in the same order as their enclosing transactions. The order of transactions can be determined by examining a virtual timestamp or some other mechanism. For systems without virtual timestamps or other mechanism to establish order between two transactions, they may be arranged using a transaction-ordering mechanism as described in a co-pending application entitled "Constructing a Transaction Serialization Order Based on Parallel or Distributed Database Log Files," U.S. Application Ser. No. 08/871,001, filed Jun. 6, 1997.
Once the transaction order is known, it is necessary to order entries within each transaction. If two log entries belong to the same transaction and to the same local log, they can be arranged in the same order as they appear in the local log. The solution to this is trivial. However, the situation in which the two log entries belong to the same transaction but to different log files is not trivial and must be treated differently. And the problem of finding an equivalent log for a coherent log set can be considerably complicated when the two log entries have the same UID, i.e. they operate on the same record. This is possible because of "intra-transaction tuple migration" (a "tuple" is another word for a record), a phenomenon that occurs when within a transaction, a UID disappears from one node and reappears at another node, either because of "cross-node intra-transaction UID recycling" or because of "intra-transaction partition redistribution." Generally, "intra-transaction UID recycling" occurs when a transaction deletes a record of some UID and then inserts another record with the same UID. More specifically, "cross-node intra-transaction UID recycling" occurs when the "delete" and the "insert" occur on different nodes. "Intra-transaction partition redistribution" occurs after a number of insert and delete operations have been executed to the table resulting in uneven partition sizes. "Redistribution" rearranges the records to even out the partition sizes. When redistribution is encapsulated as a dedicated transaction, it is clean in concept and easy to handle. However, including redistribution as part of an ordinary transaction may be useful for transactions that insert or delete a very large number of records. It is in these cases that "intra-transaction partition redistribution" occurs.
Up to now, there has been no method for ordering entries belonging to the same transactions but to different nodes. It is the object of the present invention to develop a method that merges the local log files into a coherent, equivalent global log file that overcomes the aforementioned difficulties without imposing any additional requirements on the logging subsystem.