The present application is related to the following commonly-owned U.S. patent applications, the disclosures of which are hereby incorporated by reference in their entirety, including any appendices or attachments thereof, for all purposes:
Ser. No. 09/311,781, filed May 13, 1999 and entitled System and Methods For Synchronizing Datasets in a Non-FIFO or Otherwise Difficult Communication Environment; (None-patent)
Ser. No. 09/208,815, filed Dec. 8, 1998 and entitled System and Methods for Robust Synchronization of Datasets; (none-patent)
Ser. No. 09/136,215, filed Aug. 18, 1998 and entitled System and Methods for Synchronization Two or More Datasets; (U.S. Pat. No.; 6,295,541)
Ser. No. 09/136,212, filed Aug. 18, 1998 and entitled Data Processing Environment With Methods Providing Contemporaneous Synchronization of Two or More Clients; (U.S. Pat. No.; 6,275,831)
Ser. No. 09/020,047, filed Feb. 6, 1998, and entitled Methods for Mapping Data Fields From One Data Set to Another in a Data Processing Environment; (U.S. Pat. No.; 6,215,131)
Ser. No. 08/923,612, filed Sep. 4, 1997 and entitled System and Methods for Synchronizing Information Among Disparate Datasets; and (none-patent)
Ser. No. 08/693,677, filed Aug. 12, 1996 and entitled Scheduling System With Methods for Peer-To-Peer Scheduling of Remote Users. (U.S. Pat. No.; 6,016,478)
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
The present invention relates generally to synchronization of dataxe2x80x94that is, the process of taking two or more separate collections of data (xe2x80x9cdatasetsxe2x80x9d) and applying changes to one or more of the datasets to make the datasets identical or equivalent. The present invention is particularly relevant to synchronization involving a dataset that may separately synchronize with multiple other datasets at various times, especially if the other datasets may also synchronize with one another.
With each passing day, there is ever increasing need for synchronization solutions for connected information devices. Here, information devices include, for example, general- or special-purpose computers of all types and sizes, Internet or intranet access devices, cellular phones, pagers, and other handheld devices including, for example, the StarTAC(copyright) clipOn Organizer, REX PRO(trademark) organizer, Palm organizer, Microsoft xe2x80x9cWindows CExe2x80x9d devices, and the like.
(StarTAC is a registered trademark of Motorola, Inc. of Schaumburg, Ill. REX and REX PRO are trademarks of Franklin Electronic Publishers of Burlington, N.J. The StarTAC(copyright) clipOn Organizer and REX(trademark) and REX PRO(trademark) organizers include licensed technology from Starfish Software, Inc. (xe2x80x9cStarfishxe2x80x9d), the present assignee. Palm organizers are produced by Palm Computing, Inc., a subsidiary of 3Com Corp. of Santa Clara, Calif. The Windows CE device operating system and other Microsoft software are produced by Microsoft Corporation of Redmond, Wash.).
As the use of information devices is ever growing, users often have their data in more than one device, or in more than one software application. Consider, for instance, a user who has his or her appointments and contacts on a desktop personal computer (PC) at work and also has appointments or contacts on a notebook computer at home and on a battery-powered, handheld device for use in the field. The user is free to alter the information on any one of these devices independently of the other devices. What the user wants is the information in each device to remain synchronized with corresponding information in other devices in a convenient, transparent manner. Further, some devices are connected at least occasionally to a server computer (for example, an Internet server) that stores information for the user. The user would of course like the information on the server computer to participate in synchronization, so that the information on the server computer also remains synchronized.
An early approach to maintaining consistency between datasets was simply to import or copy one dataset on top of another. That simple approach, one which overwrites a target dataset without any attempt at reconciling any differences, is inadequate for all but the simplest of applications. Expectedly, more sophisticated synchronization techniques were developed. In particular, techniques were developed for attempting to reproduce in each dataset the changes made in other dataset(s) since a previous synchronization and resolving any conflicts involving such changes, automatically or with user assistance. Some earlier examples of such synchronization techniques were limited to xe2x80x9cpoint-to-pointxe2x80x9d synchronization, in which exactly two datasets are synchronized. Later, certain xe2x80x9cmulti-pointxe2x80x9d synchronization techniques were developed by Starfish that are capable of synchronizing arbitrarily many datasets using a single synchronization system or in response to a single interaction with a user.
At first, as a typical user first began to accumulate more than two datasets that needed synchronization, he or she typically found that a hub-and-spoke configuration of his or her datasets was sufficient. FIG. 1 illustrates an example 100 of such a hub-and-spoke synchronization configuration for a hub dataset 105 and satellite datasets 110, 115, 120. According to the configuration 100, the hub dataset 105 participates in every synchronization and serves as a central repository of data from all datasets, as known from all synchronizations to date. Any one of the satellite datasets 110, 115, 120 (e.g., synchronization clients) may or may not participate in any particular synchronization, depending for example on availability, user preference, or the capabilities of the synchronization system being used (e.g., point-to-point or multi-point). In the example configuration 100, the hub dataset 105 and the satellite datasets 110, 115, 120 reside, for example, on a PC 140, a first handheld device 145 (e.g., a Palm organizer), an Internet server 150, and a second handheld device 155 (e.g., a StarTAC(copyright) clipOn Organizer), respectively. Synchronization paths 125, 130, 135 (e.g., serial cables (e.g., RS-232), infrared connections, the Internet, or the like) connect the datasets as indicated.
A configuration, such as the configuration 100, that has a single, permanently-designated hub dataset is useful for synchronizing the user""s datasets, as long as the hub dataset is always available when synchronization is desired. However, as the user accumulates ever more datasets and uses them in ever more contexts, the user increasingly wishes to deviate from such a configuration. In particular, the user wishes to synchronize datasets with one another without requiring that a single permanently-designated hub dataset be available to participate in every synchronization. By wanting to deviate from a rigid hub-and-spoke configuration of the user""s datasets, the user introduces additional complexity to the synchronization task. If not understood or handled properly, the additional complexity can cause inefficiencies or even corruption of the user""s data.
Consider, for example, a user who formerly used the single permanently-designated hub dataset 105 in all synchronizations as a reference dataset. This user now wishes to occasionally synchronize the formerly permanently-designated satellite datasets 110, 115 directly with one another without participation of the formerly permanently-designated hub dataset 105. In effect, the user wishes to create a circular, or looping, synchronization relationship among the three datasets 105, 110, 115. In this looping relationship, the direct synchronization between the former satellite datasets 110, 115 cannot take advantage of knowledge stored in the former hub dataset 105. Thus, during the direct synchronization not involving the former hub dataset 105, the former satellite datasets 110, 115 may not realize the extent to which they may have already been synchronized by the former hub dataset 105. As a result, the former satellite datasets 110, 115 may needlessly exchange user data that is actually already known to the other side of the communication and thereby waste processing resources and communication bandwidth.
Additionally, the former satellite datasets may fail to realize that certain received user data is already known and therefore redundant. As a result, one or both former satellite datasets may actually treat the received redundant user data as new data to be added locally and thereby corrupt the user data by creating duplicate records. Such possible waste of processing resources and communication resources, and such possible corruption of user data with erroneously duplicated records, can further compound in subsequent synchronizations. This further compounding may occur, for example, in a subsequent synchronization involving the former hub dataset 105. In such a subsequent synchronization, the knowledge within the former hub dataset 105 may no longer be up-to-date, due to changes made to the former satellite datasets 110, 115 during their direct synchronization. As a result, the synchronization involving the former hub dataset 105 may also waste resources or corrupt user data.
In light of problems associated with circular synchronization relationships, such as described above, what is needed are synchronization technologies that can synchronize datasets in a non-single-dedicated-hub configuration, especially a circular configuration, without corrupting user data (once or repeatedly) and without needlessly processing and re-transmitting already-known user data (once or repeatedly). More generally, what is needed are synchronization systems and methods that share and preserve synchronization status information in an intelligent manner so that later synchronizations can more fully take advantage of knowledge gained in earlier synchronizations, even if the earlier and later synchronizations are orchestrated by different synchronization systems or use different reference datasets.
Embodiments of the present invention include systems and methods for synchronization that are especially suitable and efficient for a user who has three or more datasets that need to be occasionally synchronized, with no one dataset or synchronization system being guaranteed to participate in every synchronization. According to an aspect of the present invention, synchronization status information, such as correspondences between data records of particular multiple datasets, is stored with more than one of the datasets. When two datasets that contain synchronization status information synchronize with each other, they not only exchange status information involving each other and synchronize user data, but they also synchronize and exchange status information involving xe2x80x9cthird-partyxe2x80x9d datasets that may not be participating in the synchronization. In this way, synchronization status information collected in an earlier synchronization is made available in a later synchronization, even if the earlier and later synchronizations are not conducted by a same synchronization system or do not both include a common, permanently-designated xe2x80x9chubxe2x80x9d dataset. According to another aspect of the invention, when datasets being synchronized are found to contain mutually-duplicative data records, the data records are intelligently mapped to each other as being corresponding so as to avoid or minimize causing of changes to be made, in the data records, that would make synchronization status information stored in another dataset to become obsolete.
In an embodiment of the invention, a method is provided for synchronizing a first dataset with a second dataset in an information processing system. The first and second datasets each includes user data, and the user data of each of the first and second datasets is capable of having been changed independently of the other of the first and second datasets. The method includes a step of maintaining, for the first dataset, information that is descriptive of synchronization between the first dataset and a third dataset. The information may be referred to as the third-party information. The method further includes steps of communicating the third-party information; and synchronizing the first dataset with the second dataset using the communicated third-party information.