1. The Field of the Invention
The present invention relates to synchronization of multiple copies of data. More specifically, the present invention relates to systems and methods that allow two copies of data to be synchronized so that incremental changes made to one copy of the data can be identified, transferred, and incorporated into the other copy of the data.
2. The Prior State of the Art
Today, business and technology trends are changing the way we use computers and information. The personal computer or PC has become the standard business information tool as prices have decreased and computing power has increased. In record numbers, businesses are reengineering their organizational structure and processes to become faster and more competitive, in addition to being better able to use the wealth of information resources available today. Never before has there been so much information so readily available nor such high expectations for how much the individual will be able to accomplish by utilizing this information. The result is that people today need access to information everywhere, anytime. In June 1994, Microsoft announced a new product designed to meet the these needs called Microsoft.RTM. Exchange.
The main concept behind Microsoft.RTM. Exchange is to provide a product that integrates E-mail, scheduling, electronic forms, document sharing, and other applications such as customer tracking to make it all together easier to turn information into a business advantage. The result is that users can access, organize, and exchange a world of information, wherever they happen to be in the world--whether from the office, the home, or while traveling on the road. In essence, a main barrier to PC-based communication, namely, accessibility and sharing by multiple parties of up-to-the minute information, has now been significantly reduced.
With the increased accessibility and sharing of information between multiple users, it is now more common than ever for such multiple users to simultaneously or in tandem work on shared data set objects, as for example word processing documents, spread sheets, electronic forms, E-mail messages, graphic images, or a host of other such data objects. With such shared use of data objects among multiple users of a computer network, there arises the need for each user to keep all other users of the same data object or the same set of data objects apprised of changes that are made locally by that user. This need gives rise to a process called replication of data, that is, transferring incremental changes (e.g., creation of new data, modification of existing data, or deletion of existing data) made locally at one server to a specified list of other remote or locally connected servers.
Employing such a replication model, Microsoft.RTM. Exchange creates a network or enterprise of remote or locally connected servers having copies of data objects or sets of data objects. Users may then access a copy of one or more of the shared data objects or sets of data objects in order to gain access to desired information. Changes made to one of these data objects will be replicated among all other servers having a copy of the data object so that all information remains current throughout the network or enterprise.
Such a model works extremely well when all members of the network or enterprise use a common replication model. However, many situations arise where it would be desirable to transfer information between systems that do not use a common replication model. For example, it may be desirable to extract information from a Microsoft.RTM. Exchange or other replication enterprise and store the extracted information in a format that is very different from that used by the replication enterprise. As an example of a specific application, suppose someone wished to provide an indexing and search engine for all publicly shared objects in a replication enterprise. This would require examining all objects replicated throughout the enterprise and indexing each object so that the information in the object can be quickly accessed. As changes are made to the objects replicated throughout the enterprise, these changes must be provided to the search engine so that it can update its information to incorporate the changes. It would be desirable to allow such an indexing system to synchronize with the replication enterprise to remain current with changes made to data objects. Other systems may have similar needs to place information into a particular replication enterprise.
In other situations, it may be desirable to synchronize information from two different replication enterprises that store the information in different underlying structures. For example, it may be desirable to provide a copy of one or more internet news groups in a replication enterprise. This would involve extracting information from one system (the internet) and placing the information into another system (the replication enterprise). There currently does not exist a generalized synchronization model that allows information stored in different underlying structures to be synchronized so that when a change is made to the data stored on one system, the change can be transferred to the other system. Prior systems often require very specific replication information be maintained by each system. Furthermore, it is a common expectation that record identifiers and change identifiers are of a common format. This requirement makes it difficult to synchronize with different systems.
Another situation where it would be desirable to extract information from a replication enterprise is where a user desires to carry a copy of publicly replicated objects on a mobile system, such as a laptop or other system. While it might be possible to make the laptop an integral member of the replication enterprise, such a solution is generally not preferred since it places an administrative burden on the network administrator. If a laptop or other system is made an integral part of the replication enterprise, then the network administrator must typically set up and administer the system as part of the general replication model. If the system is a mobile system, such as a laptop, which can connect to the replication enterprise in a manner that changes from day-to-day or hour-to-hour, it is generally not feasible to require the network administrator to keep modifying the replication enterprise configuration to accommodate the changing replication topology caused by a system connecting into the replication enterprise in an ever changing fashion. It would be highly desirable to allow such a system to keep a local copy of information replicated throughout an enterprise and yet reduce or remove the administrative burden on a network administrator. Currently, there does not exist a generalized synchronization model that allows such a local copy to be kept while simultaneously reducing or eliminating the network administrative burden.
Some attempts have been made to accommodate synchronization for a mobile system. For example, a simple method using peer to peer replication would be for the connecting system to maintain a time stamp of the last time it connected to the server. When the system connects again, it can ask for all changes that have occurred since the last time it connected. Unfortunately, this simple model does not work well in a replication enterprise where copies of information move from server to server. For example, suppose the system was last connected to the server at 11:30 and is currently connecting to the server at 12:00. The system can ask for all changes that have occurred since 11:30. Suppose the system disconnects at 12:05 and at 12:06 a change is received by the server from the replication enterprise that occurred at 11:45. When the system reconnects to the server, it will ask for changes that occurred after 12:05. The 11:45 change will never be retrieved. Furthermore, many systems using this type of model require connection to the same server each time. It would be highly desirable to allow a mobile system to connect to any server in the replication enterprise and be able to properly receive all required changes to remain current.
Yet another example where it would be desirable to extract information from a replication enterprise is where the incremental changes that are made to the objects replicated throughout the enterprise are to be backed up and saved in a particular location. Such an incremental backup would be desirable if events caused the loss of critical information and it was necessary to rebuild the state of the replication enterprise from a particular point in time. While it may be possible to make such an incremental backup an integral part of the replication enterprise, such an approach can create several problems. For example, if the replication model used by the replication enterprise is highly sophisticated, then any incremental backup that was an integral part of the replication enterprise must implement most, if not all, of the details of the replication model. This typically creates a complex piece of software in a situation where simplicity is preferred. Furthermore, if the incremental backup is stored in a different underlying format than that used by the general replication enterprise, further problems may be created.
It would be highly desirable to handle all of the above-described situations with a generalized synchronization model that allowed changes to be extracted from a replication enterprise or to be placed into a replication enterprise. It would be desirable to incorporate into the model the ability to synchronize data from different systems in different underlying formats. Furthermore, it would be desirable to allow these systems to synchronize information with little or no change to the underlying storage structure.