1. Technical Field
The present invention relates generally to the field of data synchronization and, in particular to the update synchronization process between data in one computer processing device and their replicas in another computer processing device, with data transmission enabled by a wired or wireless computer network.
2. Background Description
The data synchronization process, or the sync process, refers to the computation process that brings different database replicas of an application (e.g., a calendar application, an address book application, a relational database table) into a synchronized state. For the purposes of the present invention, a synchronized state can be either a consistent state or an identical state. A consistent state between multiple database replicas of an application means that the final differences between these replicas are caused by the user""s choice and, therefore, are anticipated by the user.
For example, a user may synchronize the Address Book application in his Palm Pilot with his Lotus Notes system in his personal computer (PC). Typically, the user may choose the option to synchronize the two Address Book replicas into an identical state. This means that all the information which participates in the sync process from one replica is also properly represented in the other replica. It is to be noted that not every item in an application from one device may participate in the sync process. For example, some items in a Lotus Notes Address Book record may not have corresponding items in a Palm Pilot""s Address Book application. Hence they do not participate in the sync process and are irrelevant to the resulting sync state. We also note that the same information may be represented by different storage or syntactical formats in various systems. For example, the dates xe2x80x9cMay 20, 1999xe2x80x9d in one device and xe2x80x9cMay 20, 1999xe2x80x9d in another device represent the same date though their physical appearances are different.
As an alternative to achieving an identical state as a result of synchronization, the user may choose the option to synchronize two items to a consistent state. For example, in synchronizing the E-mail application between the Palm Pilot and the Lotus Notes, the user may specify that only new e-mail items from the Palm Pilot are to be transmitted to the Lotus Notes system whereas new e-mail items from the latter are not to be sent to the former during the current sync process. Such a situation may occur when the user is in a hurry to send an e-mail composed in the Palm Pilot through the Lotus Notes, and the full two-way sync process may be too long to wait for.
With respect to the above example, the database replicas of the two e-mail systems may not be in an identical state after synchronization because new e-mail items may exist in the Lotus Notes but not in the Palm Pilot. But if all new e-mail items from the Palm Pilot are properly represented in the Lotus Notes, the two replicas are in a consistent state as their differences are the results of the user""s choice. It is to be noted that a consistent state can be brought to be an identical state later by a sync process with the user choosing a two-way complete sync.
An important issue in data synchronization is the handling of conflicts. A conflict occurs when the same record represented in multiple database replicas has, since the last sync process, been independently modified in such a way that, among all replicas, no single version of this record can be determined to be more current than any others. For example, the phone number of an Address Book record (e.g., in the record for John Doe) has been independently changed in both the Palm Pilot and the Lotus Notes since the user last synchronized the two systems. Suppose at least one of the systems (e.g., the Palm Pilot) does not support a reliable time-stamp that records exactly when a record is modified. When the user synchronizes the two systems again, if the two new phone numbers for John Doe from each of the two systems are different, then the sync process cannot determine if either number is more current than the other number. The two versions of the same record (that of John Doe) therefore create a conflict.
Those skilled in the art will appreciate that conflicts may not be caused by independent updates alone (referred to as the update-update conflict). Deleting a record in a system (e.g., the Palm Pilot) may also cause a conflict if a replica of this record has been updated independently in another system (e.g., Lotus Notes) since the two systems last synchronized (referred to as the delete-update conflict or update-delete conflict).
Thus, for the sync process to bring multiple replicas of the same information into a synchronized state, the sync process should detect whether there exist conflicts among these replicas, and, if so, then the sync process should resolve these conflicts. Typically, the sync process performs the preceding steps (detecting and resolving conflicts) by invoking specialized conflict resolution procedures to handle conflicts.
Prior art in conflict handling during data synchronization can be generally categorized into three approaches. The first approach allows the user or the administrator of the system to specify some record-based (or coarse-grained) default actions for an application when conflicts occur. For example, the user (or the administrator) can configure the system such that if an update conflict occurs while synchronizing the Address Book application between the Palm Pilot and the Lotus Notes, then the updated record from the former always overrides that from the latter (or vice versa). The set of actions for resolving conflicts typically include: (1) updates from one system always override updates from the other system, or vice versa (for update-update conflict); (2) updates always override deletes, or vice versa (for update-delete conflict); and (3) rename (change the identity of) the record of one system, or vice versa (for update-update conflict).
In the first approach, the sync process typically treats each record (such as, for example, an Address Book record, a Calendar record, a Expense Application record, and so forth) as an atomic unit and does not have knowledge of the detailed formats inside these records. Therefore, we call this approach coarse-grained because the conflict resolution actions always involve the whole record (e.g., the entire Address Book record for John Doe) and not the more detailed information inside each record (e.g., the last name, phone number, zip code for John Doe). This approach is generally referred to as the default-setting approach.
The second approach allows for fine-grained handling of conflict by way of the sync process. In particular, while synchronizing an application, an executable code (referred to as a plug-in) developed specifically for this application is loaded and executed. All the details in synchronizing data from different replicas such as retrieving data, inserting data, updating data, detecting conflicts, and resolving conflicts are handled in this plug-in. The plug-in for an application may be developed by the developer of the application, and not the developer of the synchronization system. For example, the plug-in (also referred to as the xe2x80x9cconduitxe2x80x9d) for synchronizing the E-mail application between the Palm Pilot and the Lotus Notes is developed by the Lotus Notes organization that supplies the Lotus Notes E-mail application, and not by the Palm Pilot organization that supplies the synchronization system called HotSync. Because the application developers have full knowledge of the data format of their application, the plug-in they develop for their application can manipulate detailed information inside each record for synchronization or conflict resolution. This approach is generally referred to as the plug-in approach.
The third approach is similar to the first approach in that the sync process provides record-based conflict resolution settings to handle conflicts. However, this approach differs in that these settings are not configured beforehand (before the sync process takes place), but can be set by the user after conflicts are detected during synchronization. For example, in a typical synchronization scenario, a user uses a mobile computer processing device (such as a Palm Pilot) to synchronize an application with a stationary computer processing device (such as a database server). Upon detecting a conflict during synchronization, information about this conflict is then sent to the Palm Pilot which displays this information. The user may then specify a record-based (coarse-grained) action to resolve this conflict. The set of actions that the user may specify to resolve a conflict is similar to those specified in the first approach.
One of the problems plaguing the first and third approaches is that they lack the fine-grained control of data synchronization and conflict resolution. For example, consider the following scenario. If a user, while synchronizing an application between a mobile device and a server, chooses to override one field from the mobile device to the server and another field of the same record from the server to the mobile device to resolve a conflict, he cannot achieve this objective using the first and the third approaches. One of the problems plaguing the second approach is that the plug-in is written in a certain programming language. The conflict resolution logic, once written with a programming language, stays the same until it is rewritten and redeployed. For example, the user may choose to handle conflicts based on a configuration written as logic embedded in a plug-in code. If the user later decides to choose a different configuration, such as, for example, reversing the override directions for some fields, then the plug-in for the different configuration must be rewritten.
Writing and rewriting a plug-in involves a set of skills usually not possessed by the typical user. It also requires a software development process that typically involves code changes, code compilation, code testing, and code deployment. It is a time consuming process. Therefore, the user cannot switch from one conflict resolution configuration to another with a high frequency. In other words, conflict resolution and synchronization control cannot be managed by the user in a dynamic fashion. Furthermore, a synchronization system typically involves more than one user. For example, many users may each have a mobile device that can synchronize with one or more servers. In a scenario where each user has a synchronization and conflict resolution configuration that may be different from the other users, each different configuration will need to have a unique plug-in. The arbitrary enumeration of potentially a large number of plug-ins for one application creates such high complexity that it renders the synchronization system difficult to operate and maintain.
Thus, it would be desirable and highly advantageous to have a method and system in which a user may specify a synchronization and conflict resolution configuration in a dynamic manner.
The present invention is directed to a method and system for synchronizing data using fine-grained synchronization plans. The present invention overcomes the above described problems of the prior art and provides a method and system in which a user may specify a synchronization and conflict resolution configuration in a dynamic manner (which is not achievable by the first and second prior art approaches described above) and with a fine-grained control (which is not achievable by the first and third prior art approaches) with or without user interaction during synchronization (which is not achievable by the third prior art approach).
According to a first aspect of the present invention, there is provided a method for performing synchronization between a first replica associated with an application in a first computer processing device and a second replica associated with the application in a second computer processing device. The method includes the step of generating a synchronization plan for the replicas for managing the synchronization therebetween. The sync plan includes data structure information corresponding to data structures of the replicas, storage access information for enabling access to each individual data unit within the data structures of the replicas, and synchronization and conflict resolution actions for specifying actions to be taken for each individual data unit with respect to the synchronization and any conflicts resulting therefrom. At least one individual data unit in the first replica is synchronized with a corresponding individual data unit in the second replica, in accordance with the sync plan.
According to a second aspect of the present invention, the method further includes the step of organizing the data structure information in a hierarchal order.
According to a third aspect of the present invention, the hierarchal order may include a replica level corresponding to the entire replica and an indivisible data unit level corresponding to a smallest possible data unit within the data structures.
According to a fourth aspect of the present invention, the method further includes the step of enabling an individual to create, update, or delete the synchronization plan using an input/output device of the first and/or the second computer processing devices.
According to a fifth aspect of the present invention, the method further comprises the step of generating a mapping of the data structures of the first and the second replica so as to link each data unit participating in the synchronization in the first computer processing device with a corresponding data unit in the second computer processing device.
According to a sixth aspect of the present invention, the data structures of the first and the second replicas are different.
According to a seventh aspect of the present invention, the first and the second computer processing devices are coupled through a network.
According to an eight aspect of the present invention, the synchronizing step is performed in the first or the second computer processing device.
According to a ninth aspect of the present invention, the method further includes the step of storing the synchronization plan in the first or the second computer processing device.
According to a tenth aspect of the present invention, the synchronizing step is performed by the first or the second computer processing device, and further includes the step of determining and retrieving the portions of the first and the second replicas that are to be replicated, in accordance with the synchronization plan. It is determining whether any conflicts exist between the portions of the first and the second replicas that are to be replicated. If such conflicts exists, then they are resolved in accordance with the synchronization plan. The portions of the first and the second replicas that are to be replicated are then replicated in accordance with the synchronization plan and the mapping.
According to an eleventh aspect of the present invention, one or more of the two replicas are stored or obtained by an other computer processing device having a continuous or an intermittent connection to the network.
According to a twelfth aspect of the present invention, the synchronizing step is performed in the other computer processing device.
According to a thirteenth aspect of the present invention, the method further includes the step of storing the synchronization plan in the other computer processing device.
These and other aspects, features and advantages of the present invention will become apparent from the following detailed description of preferred embodiments, which is to be read in connection with the accompanying drawings.