The present invention relates generally to synchronization of dataxe2x80x94that is, the process of taking two or more separate collections of data (xe2x80x9cdatasetsxe2x80x9d), identifying differences among them, and applying changes to one or more of the datasets to make the datasets identical or equivalent.
With each passing day, there is ever increasing need for synchronization solutions for connected information devices. Here, information devices include for example general- or special-purpose computers of all types and sizes, Internet or intranet access devices, cellular phones, pagers, and other hand-held devices including, for example, REX PRO(trademark), PalmPilot and Microsoft xe2x80x9cWindows CExe2x80x9d devices, and the like.
(REX(trademark) and REX PRO(trademark) are trademarks of Franklin Electronic Publishers of Burlington, N.J. REX and REX PRO organizers include licensed technology from Starfish Software, Inc. (xe2x80x9cStarfishxe2x80x9d), the present assignee. PalmPilot organizers are produced by Palm Computing, Inc., a subsidiary of 3Com Corp. of Santa Clara, Calif. The Windows CE device operating system and other Microsoft software are produced by Microsoft Corporation of Redmond, Wash.)
As the use of information devices is ever growing, users often have their data in more than one device, or in more than one software application. Consider, for instance, a user who has his or her appointments and contacts on a desktop personal computer (PC) at work and also has appointments or contacts on a notebook computer at home and on a battery-powered, hand-held device for use in the field. The user is free to alter the information on any one of these devices independently of one another. What the user really wants is the information in each device to remain synchronized with corresponding information in other devices in a convenient, transparent manner. Still further, some devices are connected at least occasionally to a server computer (for example, an Internet server) which stores information for the user. The user would of course like the information on the server computer to participate in synchronization, so that the server computer also remains synchronized.
An early approach to maintaining consistency between datasets was simply to import or copy one dataset on top of another. This simple approach, one which overwrites a target dataset without any attempt at reconciling any differences, is inadequate for all but the simplest of applications. Expectedly, more sophisticated synchronization techniques were developed. In particular, a synchronization technique was developed, in which exactly two datasets are synchronized by a PC-based synchronization system that is specific to a particular other device (e.g., PalmPilot organizer). The synchronization is conducted in a single, intensive session via a direct local connection (e.g., serial cable or short infrared link) that is maintained during the entire synchronization. Thus, the prior synchronization technique is a session-based, connection-based technique.
The prior, PC-based synchronization system functions as follows. First, it directly requests and receives (i.e., reads) one record at a time from the other device""s dataset via the local connection to obtain changes that have been made to that dataset since a previous synchronization. Then, the system similarly obtains changes that have been made to the PC""s dataset since the previous synchronization. The system next resolves any identified conflicts involving these changes, for example, by asking the user to choose a winner from two changes that conflict. Finally, the system directly propagates (e.g., writes) the conflict-resolved changes from each of the datasets into the other dataset, to leave the two datasets in identical or equivalent states. During the synchronization, which typically lasts several minutes or less, both datasets are xe2x80x9clockedxe2x80x9d to prevent the user from modifying the datasets.
The prior synchronization techniques have their uses. However, as more and more types of devices are introduced that include datasets to be synchronized, a need has arisen for improved synchronization schemes to take advantage of (or compensate for) the particular characteristics of these new devices and datasets. Consider, for example, the fact that many modern devices such as pagers and cellular phones are now capable of distant wireless communication or Internet-based communication. It would be desirable to efficiently synchronize user information in such devices using such distant communication mediums. Further, because many of the modern devices are capable of conducting message-based or connectionless communication (e.g., electronic mail (e-mail) or wireless paging) as opposed to connection-based communication (e.g., direct serial connection), it would be desirable to efficiently synchronize the user information in such devices using message-based communication techniques, especially automated techniques that require little to no user intervention besides initiating synchronization. Unfortunately, the prior synchronization technique, which is designed for use over a direct local serial connection, is not well-adapted to the characteristics commonly associated with distant and/or message-based communication, especially if errors occur during communication, as will be described.
Consider for instance the characteristic of high communication latency, which is unavoidable for certain popular communication technologies such as paging or e-mail. Paging or e-mail messages can take minutes, hours, or sometimes even longer (e.g., days) to be delivered, when the messages are not lost outright. Clearly, the prior synchronization scheme, which requires numerous sequential communication exchanges (e.g., one request-response cycle per data record), will be intolerably slow to finish if directly applied to synchronize datasets across such high-latency communication mediums.
Further, if a synchronization might take a long time to finish (e.g., more than thirty minutes), due for example to high latency, the user would want to use (e.g., modify) his or her data during the synchronization. The prior synchronization system cannot accommodate such a user because the system locks the datasets against modifications during every synchronization. The prior synchronization system cannot be rescued simply by modifying it to leave the datasets unlocked during synchronization because then the modified system would not guarantee data integrity. In particular, suppose the prior system, during a synchronization, reads a handheld device""s data record xe2x80x9cBill Smith, version Axe2x80x9d at time 1, and updates it with a PC""s updated corresponding data record xe2x80x9cBill Smith, version Bxe2x80x9d at time 3. If the prior system were modified to allow the user to hand-modify the record at time 2 into xe2x80x9cBill Smith, version Cxe2x80x9d in the middle of the synchronization, then the just-made modification (xe2x80x9cversion Cxe2x80x9d) would be overwritten at time 3 as a part of the synchronization without any attempt to determine whether the just-made modification (xe2x80x9cversion Cxe2x80x9d) is in fact the one that should be retainedxe2x80x94i.e, without any conflict resolution. In short, the user""s desired information (when xe2x80x9cversion Cxe2x80x9d is the desired information) may be erroneously overwritten if the prior synchronization system is simply modified not to lock the datasets during synchronization.
In addition to high latency, communication environments (e.g., wireless, message-based, or Internet-based environments) may also have other characteristics such as low or intermittent reliability and availability, low or expensive bandwidth, inability to guarantee FIFO (first-in-first-out) delivery order, and the like. Each of these other characteristics introduces additional problems to the synchronization task. Furthermore, combinations of such communication characteristics makes the problems especially difficult to solve. To take just one example, consider the task of synchronizing over a communication medium that cannot guarantee FIFO delivery order for transmissions and is susceptible to high latency (e.g., latency longer than an hour or longer than a day). Suppose that, in a first synchronization, some synchronization-related messages containing data are sent across this communication medium. Suppose that the user thereafter makes changes to one of the datasets involved, in the ordinary course of using the dataset. Then, perhaps hours or days later, the user attempts a second synchronization over the communication medium. The second synchronization finishes and leaves correctly-synchronized datasets. However, the now-obsolete synchronization-related messages sent during the earlier first synchronization may now arrive at one or more of the datasets (i.e., in non-FIFO order). If these messages are obeyed, the already-correctly-synchronized datasets may become corrupted with obsolete data from the now-obsolete messages. In short, the user""s desired information is endangered by a characteristic of the communication medium that affects synchronization.
Additional problems and difficulties caused by communication characteristics that are associated with wireless, Internet-based, or message-based communication mediums, and by the interplay of such characteristics, will be further discussed below. In summary, these problems and difficulties cause an irony that user data stored for example in cellular phones or pagers typically cannot be synchronized with data in other devices via cellular phone calls or pager messages in the field in an efficient, error-free, and cost-effective manner. Instead, users typically must wait until they return home or to the office to synchronize their cellular phone or pager with a PC via a serial cable or short infrared link. Thus, such devices created expressly for use in the field are in practice no more portable than desktop office machines, when synchronization is the desired task.
Clearly, there is a need for improved synchronization systems and techniques that are suitable for synchronization via wireless or message-based communication networks (such as cellular or pager networks) or other networks (such as the Internet) that have similar characteristics. The present invention fulfills this and other needs.
The present invention provides a system and methods for synchronizing information in datasets via a communication medium. The system and methods are suitable for synchronizing even across communication mediums that are susceptible to high latency, non-FIFO delivery order, or other adverse characteristics. According to an aspect of the invention, in an information processing system, a method for synchronizing a first dataset with at least a second dataset via a communication medium includes a step of storing information that is indicative of a first version of user data of the first dataset, wherein the first version was involved in prior use for synchronizing with the second dataset. The method further includes steps of identifying a change in the second dataset that is new relative to the first version of the user data of the first dataset; via the communication medium, communicating the change in the second dataset and indicating the first version based on the stored information; determining whether user data currently in the first dataset has changed relative to the first version that was indicated in the communicating and indicating step; deciding whether to commit the communicated change to the first dataset based at least in part on the determining step; and committing the communicated change to the first dataset if the communicated change is decided to be committed in the deciding step.
According to another aspect of the invention, a computer-implemented method is provided for handling dataset changes, in synchronizing user data in a first dataset with user data in at least a second dataset via a communication medium. Here, the user data in the first dataset and the user data in the second dataset are capable of having been independently modified prior to the synchronization. The method includes a step of sending at least a first dataset change from the first dataset, wherein the sending step includes sending an indicator of send order. The method further includes steps of receiving the first dataset change and the indicator of send order, via the communication medium; determining based on the indicator of send order whether the received first dataset change, referred to as the just-received change, was sent earlier than a previously-received dataset change from the first dataset; if the just-received change is determined in the determining step to have been sent earlier than the previously-received dataset change, refraining from propagating the just-received change into the second dataset; and, otherwise, propagating the just-received change into the second dataset, at least to an extent needed for reconciling the just-received change with the second dataset.
According to still another aspect of the invention, a computer-implemented method is provided for synchronizing user data in a first dataset with user data in at least a second dataset. The first dataset and the second dataset each is capable of including dataset changes that have been made independently of any synchronization with the other of the first and the second datasets. The method includes a step of sending at least some of the dataset changes of the first dataset. The method further includes steps of receiving and propagating the sent dataset changes of the first dataset into the second dataset, at least insofar as the sent dataset changes of the first dataset can be reconciled with the second dataset; sending at least some of the dataset changes of the second dataset, wherein the two sending steps are intentionally not undertaken within any single communication session, and, between the two sending steps, no sending of any dataset changes occurs between the first and the second datasets; and receiving and propagating the dataset changes from the second dataset into the first dataset, at least insofar as the dataset changes from the second dataset can be reconciled with the first dataset.