1. Field of the Invention
The present invention relates in general to the field of clustered computing nodes, and more particularly to a system and method for hierarchical recovery of a cluster file system.
2. Description of the Related Art
Clusters of computing nodes help to improve system reliability by providing a failover recovery in the event of a computing node failure. If a computing node fails, applications executing on the failed computing node are recovered at another computing node of the cluster. To provide failover, computing nodes of a cluster exchange information that will support recovery of a computing node, such as with heartbeat packets.
Traditionally, clusters typically coordinate recovery of a failed node using a single computing node. Coordination of a failover recovery through a single node reduces the complexity during a crash scenario. Traditional clustered file systems do not have a hierarchy of management so that, in the event of a failure, a replica of the failed node is created and introduced to the cluster, which picks up where it left off at the time of failure. A difficulty with traditional recovery is that use of the cluster is delayed while the failover is performed, and the recovery time impacts end users.