The present invention relates generally to synchronization of data—that is, the process of taking two or more separate collections of data (“datasets”) and applying changes to one or more of the datasets to make the datasets identical or equivalent. The present invention is particularly relevant to synchronization involving a dataset that may separately synchronize with multiple other datasets at various times, especially if the other datasets may also synchronize with one another.
With each passing day, there is ever increasing need for synchronization solutions for connected information devices. Here, information devices include, for example, general- or special-purpose computers of all types and sizes, Internet or intranet access devices, cellular phones, pagers, and other handheld devices including, for example, the StarTAC® clipOn Organizer, REX PRO™ organizer, Palm organizer, Microsoft “Windows CE” devices, and the like.
(StarTAC is a registered trademark of Motorola, Inc. of Schaumburg, Ill. REX and REX PRO are trademarks of Franklin Electronic Publishers of Burlington, N.J. The StarTAC® clipon Organizer and REX™ and REX PRO™ organizers include licensed technology from Starfish Software, Inc. (“Starfish”), the present assignee. Palm organizers are produced by Palm Computing, Inc., a subsidiary of 3Com Corp. of Santa Clara, Calif. The Windows CE device operating system and other Microsoft software are produced by Microsoft Corporation of Redmond, Wash.)
As the use of information devices is ever growing, users often have their data in more than one device, or in more than one software application. Consider, for instance, a user who has his or her appointments and contacts on a desktop personal computer (PC) at work and also has appointments or contacts on a notebook computer at home and on a battery-powered, handheld device for use in the field. The user is free to alter the information on any one of these devices independently of the other devices. What the user wants is the information in each device to remain synchronized with corresponding information in other devices in a convenient, transparent manner. Further, some devices are connected at least occasionally to a server computer (for example, an Internet server) that stores information for the user. The user would of course like the information on the server computer to participate in synchronization, so that the information on the server computer also remains synchronized.
An early approach to maintaining consistency between datasets was simply to import or copy one dataset on top of another. That simple approach, one which overwrites a target dataset without any attempt at reconciling any differences, is inadequate for all but the simplest of applications. Expectedly, more sophisticated synchronization techniques were developed. In particular, techniques were developed for attempting to reproduce in each dataset the changes made in other dataset(s) since a previous synchronization and resolving any conflicts involving such changes, automatically or with user assistance. Some earlier examples of such synchronization techniques were limited to “point-to-point” synchronization, in which exactly two datasets are synchronized. Later, certain “multi-point” synchronization techniques were developed by Starfish that are capable of synchronizing arbitrarily many datasets using a single synchronization system or in response to a single interaction with a user.
At first, as a typical user first began to accumulate more than two datasets that needed synchronization, he or she typically found that a hub-and-spoke configuration of his or her datasets was sufficient. FIG. 1 illustrates an example 100 of such a hub-and-spoke synchronization configuration for a hub dataset 105 and satellite datasets 110, 115, 120. According to the configuration 100, the hub dataset 105 participates in every synchronization and serves as a central repository of data from all datasets, as known from all synchronizations to date. Any one of the satellite datasets 110, 115, 120 (e.g., synchronization clients) may or may not participate in any particular synchronization, depending for example on availability, user preference, or the capabilities of the synchronization system being used (e.g., point-to-point or multi-point). In the example configuration 100, the hub dataset 105 and the satellite datasets 110, 115, 120 reside, for example, on a PC 140, a first handheld device 145 (e.g., a Palm organizer), an Internet server 150, and a second handheld device 155 (e.g., a StarTAC® clipOn Organizer), respectively. Synchronization paths 125, 130, 135 (e.g., serial cables (e.g., RS-232), infrared connections, the Internet, or the like) connect the datasets as indicated.
A configuration, such as the configuration 100, that has a single, permanently-designated hub dataset is useful for synchronizing the user's datasets, as long as the hub dataset is always available when synchronization is desired. However, as the user accumulates ever more datasets and uses them in ever more contexts, the user increasingly wishes to deviate from such a configuration. In particular, the user wishes to synchronize datasets with one another without requiring that a single permanently-designated hub dataset be available to participate in every synchronization. By wanting to deviate from a rigid hub-and-spoke configuration of the user's datasets, the user introduces additional complexity to the synchronization task. If not understood or handled properly, the additional complexity can cause inefficiencies or even corruption of the user's data.
Consider, for example, a user who formerly used the single permanently-designated hub dataset 105 in all synchronizations as a reference dataset. This user now wishes to occasionally synchronize the formerly permanently-designated satellite datasets 110, 115 directly with one another without participation of the formerly permanently-designated hub dataset 105. In effect, the user wishes to create a circular, or looping, synchronization relationship among the three datasets 105, 110, 115. In this looping relationship, the direct synchronization between the former satellite datasets 110, 115 cannot take advantage of knowledge stored in the former hub dataset 105. Thus, during the direct synchronization not involving the former hub dataset 105, the former satellite datasets 110, 115 may not realize the extent to which they may have already been synchronized by the former hub dataset 105. As a result, the former satellite datasets 110, 115 may needlessly exchange user data that is actually already known to the other side of the communication and thereby waste processing resources and communication bandwidth.
Additionally, the former satellite datasets may fail to realize that certain received user data is already known and therefore redundant. As a result, one or both former satellite datasets may actually treat the received redundant user data as new data to be added locally and thereby corrupt the user data by creating duplicate records. Such possible waste of processing resources and communication resources, and such possible corruption of user data with erroneously duplicated records, can further compound in subsequent synchronizations. This further compounding may occur, for example, in a subsequent synchronization involving the former hub dataset 105. In such a subsequent synchronization, the knowledge within the former hub dataset 105 may no longer be up-to-date, due to changes made to the former satellite datasets 110, 115 during their direct synchronization. As a result, the synchronization involving the former hub dataset 105 may also waste resources or corrupt user data.
In light of problems associated with circular synchronization relationships, such as described above, what is needed are synchronization technologies that can synchronize datasets in a non-single-dedicated-hub configuration, especially a circular configuration, without corrupting user data (once or repeatedly) and without needlessly processing and re-transmitting already-known user data (once or repeatedly). More generally, what is needed are synchronization systems and methods that share and preserve synchronization status information in an intelligent manner so that later synchronizations can more fully take advantage of knowledge gained in earlier synchronizations, even if the earlier and later synchronizations are orchestrated by different synchronization systems or use different reference datasets.