Servers are typically used for big applications and workloads such as those used in conjunction with large web services and manufacturing. Often, a single server does not have enough power to perform the required application. To accommodate these large applications, several servers may be used in conjunction with several shared storage devices in a storage area network (SAN). In addition, it may be valuable to group servers together to achieve better availability or manageability.
As systems become large, it becomes more difficult to coordinate multiple component updates to shared data structures with high performance and efficient behavior. It would be beneficial to synthesize atomic updates on data structures spread over multiple data blocks when the hardware can only provide atomicity at the level of single block updates. The need for atomic update arises because systems can fail, and it can be costly or impossible to find and repair inconsistencies introduced by partially complete updates. One way to manage recovery is through the use of a journal that records information about updates.
What is needed is a system and method for journal recovery in a multi-node environment that efficiently restores common data structures to a consistent state even if some of the processing nodes fail while surviving nodes have overlapping updates in progress. The present invention addresses such needs.