Field of the Invention
The invention relates to a method and a system for syncing data structures. It also relates to a non-transitory computer-readable storage device. It also relates to a non-transitory computer-readable storage medium. It also relates to a computer program product. It also relates to a system, a computer and a collaborative system.
Background Art
As the Internet and the mobile smartphone and tablet market grow, more and more computer applications offer their users the capability to collaboratively work on shared data, such as text documents or any other data that represent the data structure of an application. In addition, users get used to work on these data on different devices such as desktop computers and various mobile devices. To make these applications easy to operate for its users, it is desirable that they offer at least some of the following features:
a) Unblocked Collaboration
A user working with the system should not block others from accessing or changing data. This should ideally include working on the same piece of information.
b) Consistent Merge and Intention Preservation
If multiple users are allowed to concurrently work on the same pieces of information, the system needs to be able to merge these changes. Merging needs to produce the same result for all users (convergence) and should strive to conserve as much of the users' original intentions as possible. The merged data needs to be valid in the context of the application logic.
c) Offline Collaboration
Users want to access applications from their mobile devices like laptop computers, smartphones or tablets. These devices often operate in slow or unstable networking environments like cellular networks or are often completely offline for short periods like in subway trains or for longer periods like in airplanes or holiday homes. Users still expect to be able to work with their applications even in these situations; therefore they need to support offline collaboration.
If an application does not fulfill the requirement A (unblocked collaboration), blocking becomes even worse when users go offline for longer periods of time.
d) Undo Changes
In an application, changes performed by one user can immediately affect others. As many applications tend to be complex, it is hard for users to always correctly predict the effect of their changes. It therefore greatly eases the use of an application if changes can always be undone.
Traditional undo mechanisms are linear, meaning that changes have to be undone in the opposite order in which they were originally performed. For a user it is however desirable to be able to arbitrarily undo older changes without also having to undo the following actions. In a collaborative environment this so-called non-linear undo mechanism is mandatory, as changes from a user get interleaved with changes from others so that undoing only one's own changes requires skipping the changes of others.
e) Auto Save
All the changes a user performs on the data should be saved automatically without any interaction needed from the user. This is especially important for memory constrained mobile devices in which the work is likely to be interrupted. An incoming call for example should not result in data loss when the smartphone needs to terminate the application.
Auto-saving should also enable the user to switch the device he is using while working on a change: A user might want to begin his work in the office on a desktop computer and finish it on his way home on his smartphone.
f) Publishing Independent from Saving
When a user wants to perform a complex change to shared data, he may want to perform multiple editing steps before he feels ready to share his changes with others. Therefore, publishing his changes to others ideally is independent from saving the changes and syncing them with other personal devices.
g) Audit Trail
If data is business critical, it is desirable to have a chronological record of the sequence of activities that changed the data. Therefore an application should record all user actions as events and changes as a so called audit trail. This increases the security of an application and even enables controlling workflows in which changes need to be reviewed.
There are many methods for syncing data between multiple users, which do not, however, fulfill all of the requirements described above. In the following, some of the common methods known from the art are described, and their features and problems are highlighted.
A. File Based Syncing
Popular syncing services like Dropbox or iCloud offer to sync changes that users perform on files. But as the semantics of the file contents are unknown to the syncing machinery, these services are not able to merge changes that are concurrently performed on a single file and leave it up to the application logic to solve the hard problem. These systems are only a direct solution if the data can be separated into independent files, which is rarely the case. The method of file based syncing hence only fulfills requirements a and c.
B. Central Online Server with Pessimistic Locking
Traditional enterprise applications are built around a central database server to which all users (clients) have a persistent online connection. The data stored on the server is divided into independent records, which can be edited independently by different users. As long as a user edits a record, it gets locked, which means another user cannot edit it. Once the user has finished editing, the changes are immediately visible to other users and the record gets unlocked. This common approach does not fulfill requirements a, b, c, d, e, f and g. It is also unsuitable for designs in which there are strong dependencies between objects because it needs to lock records independently.
C. Central Online Server with Optimistic Locking and Merging
A variant of the above approach is optimistic locking in which users are allowed to simultaneously edit the same pieces of data and the system is able to detect it. The system then needs to merge the individual changes, however, it has no information for performing a merge that fulfills requirement b. Most commonly, merges are performed on a field-by-field basis. The optimistic locking approach is common in object-relational mappers that persist data from object-graphs into relational databases. It fulfills requirement a but not b, c, d, e, f and g.
D. Key-Value Databases
During recent years, key-value or so-called NoSQL databases have become very popular and many web-based applications are nowadays built around them. These databases also perform merging on a field-by-field basis so that cross-dependencies between fields cannot be respected during a merge. This allows only for very simple data models. More sophisticated merging is possible with these databases but the problem is then shifted upwards to the application logic. Requirements a and c can be fulfilled but requirements b, d, e, f and g cannot be fulfilled. The popular iCloud syncing service from Apple, for instance, also offers a key-value database for syncing simple application data between iPhones and Macs.
E. System Prevalence
A successful approach for capturing more of the user's intentions is to store the whole history of user actions that led to the state of an object graph, which is employed to represent application data, in addition to the object graph itself. This history is often called history log, log or journal. When merging, action events originating from other users are applied to a user's object graph, which improves the quality of merges. For example: If two users concurrently add an object to a to-many relationship, a simple state-based merge would result in only one of the objects being added. A merge based on replaying of both events preserves the intentions and results in both objects being added.
However, in this scheme actions recorded by one user get transferred and applied to the state of another user which may not be the same as the one in which the action originated. This can lead to lots of consistency and convergence problems.
If prepared well, the log can enable linear undo by replaying the recorded actions in reverse. But it still does not allow for undoing changes in non-linear fashion.
The log can also serve as the source for audit trail information. The system prevalence pattern fulfills requirements a, c and g but not b, d, e, and f. Popular cloud syncing services like iCloud or Dropbox offer stores that are implemented using the system prevalence pattern.
F. Differential Synchronization
Even without an explicit event log, it is possible to derive the user actions by performing a delta comparison of two object-graph states. Such derived actions may capture the user's original intention well if the differences are small. This immediately requires that clients be permanently connected to a server. Offline work would lead to greater differences from which the original actions cannot be derived reliably. A differential synchronization system also needs to be able to apply derived difference actions to a state that does not equal the original state from which the difference was computed. This is called fuzzy patching. However, fuzzy patching is error prone and may result in different object graph states for different users. Differential synchronization systems can fulfill requirement a and under some circumstances b, but not c, d, e, f and g.
G. Operational Transformation
Operational Transformation (OT) is an extension of the System Prevalence pattern in which—in addition to maintaining the object graph—user actions get recorded into a log of operations. These operations can be transferred to other users, but are not directly applied to their object graph. They are first transformed against other operations to bring them into the correct context as the object graph of the receiving user might differ from the one in which the operations were originally recorded.
The challenge of implementing a working OT scheme lies in performing the following tasks:                define a set of operations that can record all possible user actions and their undo inverse.        define transformations between all possible operation pair combinations, which guarantee convergence and preserve the user intention as much as possible.        define an integration algorithm that determines which transformations are to be applied to a given operation for bringing it into the desired operation context. The algorithm also needs to guarantee that custom validation requirements of the individual model logic do not get violated.        
OT was originally developed for collaborative real-time plain-text editors but has since been applied to various other applications.
OT solutions can fulfill requirements a, b, and c, but the existing schemes make it very hard to implement operations that guarantee a correct undo behavior (requirement d). Additionally, there does not appear to be any existing OT scheme that can successfully be applied to object graphs of applications in the way described in the introduction and which fulfills requirements a, b, c, and d.
Several methods are known for syncing data structures that aim at supporting the collaborative work of several users or participants.
U.S. Pat. No. 8,527,440 B2 discloses a system for performing consistency maintenance of distributed graph structures that compares changes to identify conflicting operations. Changes made by a user to a local model are expressed through primitive operations, wherein operational transformations can be employed specifying how one primitive operation is transformed against another primitive operation.
The article “Undo as Concurrent Inverse in Group Editors” by Chengzheng Sun, published in ACM Transactions on Computer-Human Interaction, Vol. 9, No. 4, December 2002, Pages 309-361, proposes an undo solution for collaborative applications which consists of a generic transformation control algorithm that is capable of generating, transforming, and representing valid inverse operations in any context and a set of transformation functions that are capable of preserving undo-related transformation conditions and properties. Operations transformations build the foundation of this undo solution. The disclosed algorithm is named “AnyUndo”.