This invention relates to the design and implementation of a database system that includes among its basic functions the ability to efficiently synchronize all data within a specified key range. A transactional database system is used for transaction processing. This system generally consists of centrally storing information and allowing access to this information by users. User access typically consists of read, write, delete, and update interaction with the information on the database. These interactions are referred to as transactions.
The information on the database can be vital for many reasons. The database might store banking information and its users may rely on it to accurately reflect information on thousands of accounts. The database might store medical information and its users may rely on it to properly diagnose or monitor the health of a patient.
Transactional systems may be accessed thousands of times a day and its information may be altered each time. It is often imperative to insure the security of this information. These systems also face a variety of reasons to fail. Therefore, it is common for such databases to be backed up and the information made available for recovery if a failure is faced.
A common method of backup for a transactional database is synchronization. Synchronization typically involves a primary host and a secondary host. Transactions are first processed on the primary database, the secondary database is periodically synchronized or made consistent with the primary database and is used as a backup and part of a recovery system. The role of primary and secondary may be dynamic making synchronization a multi-directional path.
Since multiple copies of the information are being kept, the copies must be kept consistent. To synchronize the databases they must both compare and transfer data.
A synchronizable database D is a set containing records of the form (key, value). The key field takes values from a totally ordered set K of keys. Any key in K occurs in D at most once.
The major operations supported by a synchronizable database as an abstract data type are insertion, deletion and retrieval of records, and a range synchronization operation. The first three are standard operations on databases with records. The last one is unique to synchronizable databases.
The input to a range synchronization operation is an interval I of K and two databases D1 and D2. The operation basically tries to make the restrictions of D1 and D2 to I identical. In particular, it identifies three sets of keys, which are called the discrepancy sets K1, K2 and K12. These three sets are all subsets of the key interval I. Discrepancy set K1 is the set of keys in D1 which are not in D2, K2 is the set of keys in D2 which are not in D1, and K12 is the set of keys which are in both D1 and D2 but whose corresponding records in the two databases differ in the value field.
The operation calls different handler functions for each of these three sets. Typically, the handler functions for K1 and K2 would copy the missing records from one database to the other. The handler function for K12 would typically, for each key in the discrepancy set, compare the records in D1 and D2 that have the key and replace one of them with the other.
Since synchronization relies on the comparison and transfer of data, efficiency lies in the actions of comparing and transferring. As used herein, efficiency is an attribute of the cost of comparing and transferring data and the time intervals of comparing and transferring data.
There are many transactional systems currently in use and under development. Berkeley db system provides a generalized transactional library, which relies on logging. The db provides transactional functionality plus further database features. SQRL, a free SQL project for FreeBSD, extends the notion of a transactional substrate further to provide a transaction-based file system.
FIG. 1 depicts a synchronizable database. On the left, the figure depicts a car dealer's database, 1, that maintains a schedule of factory repairs. Each time a customer needs a factory repair, the dealer schedules it in his database. The factory, on the right, maintains a master schedule of repairs, 2. Periodically the two must be synchronized—in this case the new and changed orders must be copied from the dealer's site to the factory site and inserted into the factory's database. In order to rapidly identify the records to be transferred, the databases each support a special facility for efficient computation of a digest (hash) of any specified range of keys, which amounts to a summary of the range. This special facility corresponds to the invention's summarizable database abstraction. An efficient synchronizations facility then uses this smart summarization mechanism to minimize the amount of data that must be transferred.