1. Technical Field
The present invention is directed to an improved computing device. More specifically, the present invention is directed to an apparatus and method for building metadata using a heartbeat of a clustered system.
2. Description of Related Art
Computer clusters are generally used to perform a multitude of different computing functions because of their fault tolerance and load balancing capabilities. With a computer cluster, multiple computer systems are linked together in order to handle variable workloads or to provide continued operation in the event one fails. Each computer may be a multiprocessor system itself. For example, a cluster of four computers, each with four CPUs, would provide a total of 16 CPUs processing simultaneously. If one of the computers fails, one or more additional computers are still available and may actually take over the functions of the failed computer. In addition, load balancing mechanisms in the computer cluster are able to distribute the workload over the multiple computer systems, thereby reducing the burden on each of the computer systems.
Computer systems within the clustered system typically monitor each other""s presence using a heartbeat signal or xe2x80x9ckeep alivexe2x80x9d signal. Each computer system in the clustered system periodically sends out a heartbeat signal to the other computer systems in the clustered system essentially informing them that the sending computer system is still active and do not need their resources taken over by another computer system in the clustered system. If a heartbeat signal is not received from one of the computer systems in the clustered system, the other computer systems will determine that the computer system has failed.
When a member of the cluster has failed, i.e. when a heartbeat signal from one of the computer systems in the clustered system is not received by the other computer systems, or is otherwise not available to the clustered system, the other members of the clustered system must take over for the missing member. However, upon takeover of the functions of the missing system""s resource (typically hard disk storage), the remaining members must learn or surmise the configuration of the resources that the missing system was using. This is typically done by having the computer system that is taking over the resources read in the metadata from the resource. The metadata from the resource is data that describes the configuration of the resource, e.g. the file system, data areas, and the like.
The reading in of the metadata from the resource upon detection of a failed computing system in the clustered system may take many processor cycles to complete. In some instances, the reading in of this metadata may take upwards of several minutes to complete. In some clustered systems, a delay of multiple seconds or minutes may mean heavy financial losses. For example, in a clustered system that is used to handle financial transactions, stock purchasing and selling, or the like, a delay of several minutes may cause a large financial impact. Thus, it would be beneficial to have a method and apparatus for minimizing the amount of time necessary for a computer system to takeover the resources of a failed computer system in a clustered system.
The present invention provides an apparatus and method for building metadata using a heartbeat of a clustered system. The present invention sends portions of metadata for a computer system resource, to each of the other computer systems in the clustered system as a heartbeat data message. Upon receiving the heartbeat data message having the portion of metadata, the receiving computer systems store the portion of metadata in a temporary storage until all of the metadata is received.
In subsequent heartbeat data messages, the remaining portions of the metadata are transmitted to the computer systems which, upon receiving all portions of the metadata, store the metadata in a secure location. If the sending computer system were to fail, the metadata stored in the secure location is read and used to takeover the resources of the failed computer system. In this way, the processing cycles used to read in the metadata from the resources of the failed computer system in the prior art are eliminated.