Hierarchical data is data that contains dependency relationships between pieces of information. A dependency may be formed if, for example, one piece of data cannot be modified without also modifying the other piece of data. For example, each piece of data in a hierarchy may have a bit mask field. The business logic may have a rule that a child member cannot have its bit set unless the parent does. Thus, the child and parent have a dependency relationship. A dependency may also be formed if one piece of data cannot be modified without reading the state of another piece of data. In the above example, this would be the case if the logic has a rule that the child member cannot be modified if the state of the parent does not match the state of the child. One of ordinary skill in the art would recognize there may be other relationships and/or rules that would give rise to a dependency.
Hierarchical data is straightforward to handle when only a single user accesses the data. Problems are encountered, however, when multiple users attempt to access and/or modify hierarchical data simultaneously. This can be quite common in today's database environments, where hierarchical data is managed across multiple horizontally scaled servers and many users attempt to edit the hierarchy at the same time.
One way to handle this problem is to lock the entire hierarchy while editing. However, this only supports one user at a time, which cuts in to efficiency. A second user must wait until the first user has completed editing before editing the data himself. This is especially limiting for web-based applications, where it is quite easy for a user to accidentally leave open an editing window indefinitely, thus locking other users out indefinitely.
Another way to handle this problem is to allow users to modify separate copies of the hierarchy, and then to merge them when they are both done. However, this creates problems in that when there are conflicts between the changes, the second user must manually merge their changes with the changes of the first user. This can create downtime where the second user must contact the first user to determine why the changes should be implemented and resolve the conflict, all the while a third user may go ahead and make another conflicting modification. Additionally, the entire hierarchy needs to be copied from the server, modified, and then saved in full. This can utilize a tremendous amount of bandwidth and memory space, especially when it is common for hierarchies to be on the order of 60,000 members large, and when many users are accessing the hierarchy simultaneously.
Some databases address these issues by allowing data to be modified at the record level. This drastically decreases update time, and also allows a user to lock only the parts of the hierarchy that are changed. However, this solution only works for flat data, not hierarchical data. Assume, for example, that a user moves a member from one parent to another. Then the status of the new parent may be checked and find that each bit set in the moved member is also in the parent. The system will assume that everything is fine and the transaction will be committed. The problem occurs when, on another server, a user turns off a bit in the bit mask of the new parent. When this other server checks to see if any children had this bit turned off, it finds no children, as this server is not aware of the move yet. A few seconds layer, a user on a third server now sees a member underneath a parent that has a bit set that his parent does not have, which violates the business logic.
A final potential solution would be to simply have all reads and writes performed directly to the database, by using the proper isolation level. However, if this is done, a cache cannot be used and the performance benefits of the cache are lost, which can be significant in a multiple-user environment.
What is needed is a solution that allows for editing of hierarchical data in a cache by multiple users without creating conflicts and without sacrificing performance.