1. Technical Field
The present invention relates generally to object stores and, in particular, to a method and apparatus that enables two or more computers to partially replicate object stores by allowing synchronization of only such objects contained in the stores that are intended to be shared among the stores.
2. Background Description
In general, an object is a collection of data and/or functions which, for example, modify the data. An object store is a collection of objects. The objects in the store can either be persistent or transient. The object store provides a set of application programming interfaces for manipulating (e.g., reading, modifying and deleting) objects in the store. An object store controller is an application written xe2x80x98on topxe2x80x99 of an object store that manipulates the contents of the store.
FIG. 1 is a diagram illustrating a system to which the present invention may be applied. The system 100 includes two computers (102a, 102b). Each computer (102a, 102b) respectively includes: an object store (106a, 106b); an object store controller (104a, 104b); at least one application (120a, 120b); and a communication system (108a, 108b). The controllers (104a, 104b) are used by the applications (120a, 120b) to access and modify the object stores (106a, 106b). The communication systems (120a, 120b) enable the (system) computers (102a, 102b) to communicate with other computers. Further, the computers (102a 102b) may be either intermittently or continuously connected to one another via a communication network 112.
The Mobile Network Computer Reference Specification (MNCRS) provides a method and apparatus for synchronizing the full content of two object stores (www.mncrs.org). Every object in a store is identified by a unique object identifier. Object stores are replicated among two or more computers. Hence, a single object (identified by its unique identifier) is replicated among those computers. Different replicas of an object are potentially updated on different computers. Two replicas of an object store are synchronized periodically. Synchronization is realized using a version data structure associated with each object.
FIG. 2A is a diagram illustrating a conventional version vector data structure associated with each object in an object store. The version vector 202 includes a list of tuples 204. Each tuple 204 contains a node identifier 206 and a clock value 208. The node identifier 206 may be implemented as, for example, a simple integer that is associated with a particular computer (e.g., 102a, 102b) in the system. The clock value may be implemented as an integer reflecting a logical value. Thus, the kth tuple in the version vector 202 represents the node identifier and the local clock value for a particular object at the kth computer.
When an object is updated or inserted in an object store of a computer, the tuple in the version vector corresponding to that computer is updated with the current local clock value at that computer. Then, the local clock value is incremented. The version vectors of an object and its replica are compared as follows: for every index k, the clock value of the kth tuple in the version vector of the object is compared with the clock value of the kth tuple in the version vector of the replica. Thus, when comparing two version vectors (of an object and its replica), the object""s version vector is considered to be newer than the replica""s version vector when all the clock values of the object""s version vector are equal to or greater than all the corresponding clock values of the replica""s version vector. Alternatively, the object""s version vector is considered to be older than the replica""s version vector when all the clock values of the object""s version vector are less than or equal to all the corresponding clock values of the replica""s version vector. If only some clock values of the object""s version vector are greater than or equal to corresponding clock values in the replica""s version vector and other clock values in the object""s version vector are less than the corresponding clock values in the replica""s version vector, then the object and its replica are considered to be xe2x80x98in conflictxe2x80x99.
The version vector of an object is incremented as follows: if the kth computer in the system is incrementing the version vector, the clock value in the kth tuple of the version vector is updated with the local clock value and then the local clock value is incremented. The content of the object is not altered.
Similar to each object in an object store, the store itself is associated with a summary version vector. The summary version vector of the store includes a list of tuples, each tuple containing a node identifier and a clock value. The clock value in the kth tuple in the summary version vector for a store is the maximum of the clock values in the kth tuples in the version vectors for all objects in the store. Hence, all objects in the store are, at most, as recent as the summary version vector of the store. As a result, when the version vector of an object in one store is compared with the summary version vector of a replica store, if the version vector of the object is strictly older than the summary version vector of the replica store, then it can be concluded that the replica store has seen this object already. However, if the version vector of the object is newer than or conflicts with the summary version vector of the replica store, then it can be concluded that the replica store may not have seen this object.
FIG. 2B illustrates how two object stores perform a full synchronization in MNCRS. The controller for store1 (xe2x80x98store1 controllerxe2x80x99) requests the summary version vector of store2 from the controller of store2 (xe2x80x98store2 controllerxe2x80x99) (step 210). In response, the store2 controller sends the store2 summary version vector to the store1 controller (step 212). The store1 controller determines which objects in store1 are newer than, or conflict with, the summary version vector sent by store2 (step 214). The store1 controller then sends those objects (updates) to the store2 controller (step 216) using its communication system 108 via the communication network 112. As used herein, an update consists of an object""s contents, its identifier, and its version vector. The object content part of an update can optionally be empty for an update that signifies deletion of an object.
The store2 controller then applies those objects (updates) locally to the objects in store2 (step 218). Applying an update may consist of copying the replica object""s contents to the local object, merging the replica object""s contents with those of the local object, or simply keeping the original contents of the local object. In either case, the version vector of the local object is changed to a newer version vector reflecting that the object in store2 has been synchronized with its replica in store1.
The store2 controller first requests the summary version vector from the store1 controller (step 220). In response, the store1 controller sends the store1 summary version vector to the store2 controller (step 222). The store2 controller determines which objects in store2 are newer than, or conflict with, the summary version vector sent by store1 (step 224). The store2 controller then sends those objects (updates) to the store1 controller (step 226) using its communication system 108 via the communication network 112.
The store1 controller then applies those objects (updates) locally to the objects in store1 (step 228). Accordingly, the version vector of the local object is changed to a newer version vector reflecting that the object in store2 has been synchronized with its replica in store1. Thus, the two replicas have completed synchronization.
In the above method, each object in one store is synchronized with its counterpart in another store. The effect of such synchronization is that replicas fully share their contents. Such complete sharing may not be desirable in certain circumstances. For example, consider two traveling salesmen employed by a national company. One salesman covers the states of New York, Connecticut and Rhode Island, while the other salesman covers the states of New Jersey, New York and Pennsylvania. Each salesman has his customer information in a single object store in his mobile computer. Both salesman would only like to share information about their common customer base, that is, only those in the state of New York. Given that, for each salesman, all of his customers are in the same object store, synchronization in the above fashion would not,allow partial sharing of information.
In the context of the above example, one skilled in the art can easily infer that if the salesmen maintained separate object stores (i.e., one each for New York, Connecticut and Rhode Island), and then only synchronized the New York object store following the steps shown in FIG. 2B, they could achieve the same effect of partial sharing. Such a solution however is only ad hoc and not general or extensible.
For example, consider a third salesman that covers the states of New York, Rhode Island and New Jersey. Further, assume that all the salesmen want to share information about their common customer base and no more. To achieve this using the method shown in FIG. 2B, the first salesman needs to maintain separate object stores for customers in New York, and for customers in New York and Rhode Island; the second salesman needs to maintain separate object stores for customers in New York, and for customers in New York and New Jersey; and the third salesman needs to maintain separate object stores for customers in New York and Rhode Island, and for customers in New York and New Jersey. In addition to the separate stores, there must be a mechanism to keep consistency among overlapping stores in one mobile computer. That is, if the New York store of the first salesman is updated via synchronization with the New York store of the second salesman, there must be a mechanism to propagate the updates to the New York entries in the New York and Rhode Island store of the first salesman. Clearly, this approach is costly with respect to the time and space requirements necessary for its realization. Moreover, the approach is intractable for various sharing patterns that may exist among various salesmen in a national company. Thus, it would be desirable and highly advantageous to have a method and apparatus for synchronizing two replica object stores that enables partial sharing of the contents of object stores.
The present invention is directed to a method that enables two or more computers to partially replicate object stores by allowing synchronization of only such objects contained in the stores that are intended to be shared among the stores. In one aspect of the present invention, a method for synchronizing replica object stores store1 and store2 to enable partial sharing of objects therebetween, wherein store1 and store2 respectively have store1 and store2 controllers associated therewith, comprises the steps of:
(a) applying a filter F to the objects in store1 to generate a subset S comprising the objects in store1 to be shared with store2, wherein the filter F embodies an operation that can be applied to the objects in store1 and store2, the applying by the store1 controller;
(b) incrementing version information of the objects that were not shared between store1 and store2 as of a last synchronization but should now be shared, and of the objects that were shared between store1 and store2 as of the last synchronization but should no longer be shared, by the store1 controller;
(c) determining which objects in either the subset S or those identified by object identifiers in a list L have the version information that is newer than or conflicting with version information of store2, wherein the list L comprises object identifiers of the objects shared between store1 and store2 as of the last synchronization, the determining by the store1 controller;
(d) applying updates associated with the determined objects, by the store2 controller;
(e) updating the lists L of store2 and store1, by the store2 and store1 controllers, respectively; and
(f) repeating steps (a) through (e), with roles of the store1 with respect to the store2 reversed, and roles of the store1 controller with respect to the store2 controller reversed.
An embodiment of the present invention further comprises the step of maintaining respective data structures for store1 and store2 comprising the filter F and the list L, by the store2 and store2 controller, respectively.
These and other aspects, features and advantages of the present invention will become apparent from the following detailed description of preferred embodiments, which is to be read in connection with the accompanying drawings.