1. The Field of the Invention
The invention generally relates to data processing and handling. More specifically, the invention relates to the field of data synchronization between computer systems.
2. Background and Relevant Art
Within computer systems, data synchronization is an important function. There is often a need to have the same data available in a number of different settings and locations. Among the many examples where it is useful to synchronize data, one illustrative example involves a digital address book. A computer user may have a digital address book stored at their desktop work computer. While at work, this is a convenient and accessible location to store addresses, phone numbers and general contact information. While away from the work location, the computer user may need their address book in a mobile form such as at a personal digital assistant (PDA) or other mobile information storage system. The contact information at the PDA should match the contact information at the desktop work computer.
Further, the same computer user may have a home computer where the contact information is stored. Ideally, the contact information at the home computer, the PDA and the work computer should all be in synchronization. Returning to the computer user's work location, some of the data that is stored in the digital address book may be information that is available to other computer users within the organization where the computer user works. Thus, this data may also be stored in a centralized database accessible by a number of different computer users and yet still be able to be synchronized with the computer user's PDA, work computer and home computer.
Accordingly, in the example illustrated above, there are at least four different types of platforms for storing the digital address book. Namely, a work computer, a PDA, a home computer and a centralized database. Each of these platforms may have a replica that stores a copy of the digital address book in a data store.
Data in each data store may be maintained in different physical arrangements, for example, in a physical table and/or group of physical tables. The physical tables are an actual arrangement of the data at a computer storage device such as a database mass storage array, a computer hard drive or flash memory. As can be appreciated, each of the different platforms may store, in a replica, the exact same data that is in the other replicas. However, because of the limitations or features of a particular platform, the data may be stored in a different physical arrangement at the particular platform (i.e. in a different physical table arrangements or in files). Physically storing data differently at different replicas within the same topology presents various challenges when synchronizing the different replicas with each other.
Data at a replica is generally divided into discrete groupings of data often a referred to as “items.” For example in a digital address book, an item may be a name, and address, a phone number, an entire contact, or any other discrete grouping. In other examples, an item may be a file, an image, a folder, etc. Items at a replica can be changed by, for example, adding, deleting, and modifying an item. Due to different physical arrangements, it can be difficult to synchronize changes between replicas.
Another challenge that arises in synchronizing data at different replicas relates to the context of synchronization data granularity. As previously mentioned, data in a replica can be divided into items. These items represent discrete pieces of information that are synchronized. Commonly, the granularity of an item is defined and unchangeable for a particular replica topology. In the digital address book example, an item has a fine granularity when the item is a single field of a contact, e.g., first name, last name, phone number, street address, state, or ZIP code. In contrast, an item has a medium granularity when the item is such as one of full name (both first and last), contact numbers, or address. An item with coarse granularity might include an entire contact as an item in the replica.
Synchronization of items within the replica often requires metadata to be associated with each item. The metadata may include information such as a time stamp indicating when the item was last changed. If the item granularity is too fine, an excess of metadata can unnecessarily consume resources (e.g., storage and system memory) of the particular replica since each item must have associated metadata. For example, in the digital address book discussed above if the item granularity includes street address, city and ZIP code, metadata for each of the three items would need to be maintained. However, it is likely that a change in a street address would also result in a change in the city and ZIP code, meaning that metadata for city and ZIP are typically changed when street is updated.
On the other hand, if the granularity is too coarse, at least two problems arise, namely; too much synchronization data may need to be transmitted during synchronization and unnecessary conflicts may appear. For example in the digital address book discussed above, if the item is defined in terms of an entire contact, a change in any part of the contact results in the entire contact being sent during the synchronization. Much of this data may already be synchronized between the replicas in a topology. Therefore, redundant data is sent between two replicas during synchronization. For example, a change to a telephone number in a contact does not require that name and address information be sent to synchronize a corresponding contact. However, when an item is defined as an entire contact, a change to telephone number nonetheless causes name and address to be sent during synchronization. Thus, communication recourses are consumed to transfer data that is already synchronized.
Further, when item definitions are too coarse, the replicas may inappropriately detect a conflict between data at the replicas. For example, if the phone number of a contact is changed at a first replica and the address of the contact is changed at a second replica, the first and second replicas may appear to be in conflict if the item granularity is an entire contact. However, no real conflict may exist as the change in phone number may be completely valid and is independent of the change in the address.
In commercial digital storage applications, optimizations are often not discovered until after the particular application has been on sale and used by a number of users and optimizations can result in changes to the physical storage. Thus, a physical table containing data at an earlier version of a digital storage application may not have the same layout as a physical table storing the same data in a later version of the digital storage application. To synchronize data between different versions of a digital storage application can require that new code be written to utilize the optimizations of the later version and, yet, still provide synchronization capabilities.
While the above examples have been framed in the context of a digital address book, there are many other environments that utilize data synchronization. Some examples include document versioning, sharing files and information, software updates, etc. Each of these environments, as well as other environments, can suffer from the challenges described above. Accordingly, synchronization mechanisms that more efficiently utilize computer system and communication resources would be advantageous. Synchronization mechanisms that more appropriately detect data conflicts would also be advantageous.