A clustered system is a group of independent servers that run together as a single server for improved manageability, availability, and scalability. A clustered system requires two or more servers connected via a network. It requires a method for each server to access the other servers' data, and clustering software as utilized by the Microsoft Cluster Server (MSCS).
Clustering software provides the services necessary to manage the servers as a single system. When clustering software is running, events can happen that cause the clustered system to fail unexpectedly. These unexpected failures come in one of two forms.
One form of clustering software failure occurs when clustering between the two server nodes is no longer available. In other words, the two server nodes are no longer available to run together as a single server. Because the server nodes now lack this inter-cooperation, the two servers cannot function as a single clustered system.
The remaining second form of clustering software failure occurs when clustering has already been established. In this case, the two server nodes have already been set up as a clustered system. Although the two server nodes are clustered, an error can exist which does not allow the clustering software to perform properly.
After a cluster failure, a user of the clustered system does not know why the cluster failed. The user may not even know which of the two forms of clustering software failure occurred. Thus, the need arises to provide the user with information of how to restore clustering after experiencing a cluster service failure.
One prior art method to which the method of the present invention generally relates is described in U.S. Pat. No. 6,088,727 entitled CLUSTER CONTROLLING SYSTEM OPERATING ON A PLURALITY OF COMPUTERS IN A CLUSTER SYSTEM. The prior art method of clustering involves transferring packages that have been operating on one computer to another computer when a fault or failure has occurred by monitoring and controlling the packages in the entire system. When the respective packages are started-up, cluster daemons on the respective computers monitor and control resources on the operating computers. The monitored and controlled data are stored in the respective computers as local data. A manager communicates with cluster daemons on the respective computers, and stores data in a global data memory to monitor and control the entire system. The manager is actually one of the packages operating in the cluster system. If a fault or failure occurs in the manager or in the computer running the manager, the manager is re-started on another computer by a cluster daemon.
The present invention differs from the prior art in that the prior art method deals with the workings of the cluster software itself. The method of the present invention solves problems related to the workings of the underlying system to utilize such a cluster software package. The method of the present invention diagnoses the conditions required for the cluster software to operate and reports to the user what steps to take to remedy the situation.
Another prior art method to which the method of the present invention generally relates is detailed in U.S. Pat. No. 5,287,453 entitled FAST REMOTE FILE ACCESS FACILITY FOR DISTRIBUTING FILE ACCESS REQUESTS IN A CLOSELY COUPLED COMPUTER SYSTEM. This prior art is a cluster computer system that includes a plurality of independently operated computer systems located in close proximity to each other. Each system includes a system bus, a memory, and a set of local peripheral devices that connect in common to the system bus. The computer systems are interconnected for transferring messages to each other through the channels of a high-speed cluster controller that connect to the system buses. Each system further includes a cluster driver that transfers the messages between the memory of the computer system and the corresponding cluster controller channel when the system is configured to operate in a cluster mode of operation. User application programs issue monitor calls to access files contained on a peripheral device(s). The fast remote file access (FRFA) facility included in each system, upon detecting that the peripheral device is not locally attached, packages the monitor call and information identifying the user application into a message. The message is transferred through the cluster driver and cluster controller to the FRFA of the computer system to which the peripheral device attaches. The monitor call is executed and the response is sent back through the cluster controller and delivered to the user application in a manner so that the peripheral device of the other computer systems appears to be locally attached and the monitor call appears to be locally executed.
The present invention differs from the prior art in that the prior art deals with the fast remote file access facility to transfer information between computer systems that are clustered. The method of the present invention diagnoses the state of such facilities to communicate without specifying an underlying facility. The method of the present invention also recommends steps to remedy any problems with the facility.
Yet another prior art method to which the method of the present invention generally relates is detailed in U.S. Pat. No. 5,966,510 entitled SCSI-COUPLED MODULE FOR MONITORING AND CONTROLLING SCSI-COUPLED RAID BANK AND BANK ENVIRONMENT. The prior art method is an intelligent status monitoring, reporting and control module that is coupled to a SCSI bus that interconnects a cluster of SCSI-compatible data storage modules (e.g., magnetic disk drives). The status monitoring, reporting and control module is otherwise coupled to the cluster of SCSI-compatible data storage modules and to power maintenance and/or other maintenance subsystems of the cluster for monitoring and controlling states of the data storage modules and power maintenance and/or other maintenance subsystems that are not readily monitored or controlled directly by way of the SCSI bus. The status monitoring, reporting and control module sends status reports to a local or remote system supervisor and executes control commands supplied by the local or remote system supervisor. The status reports include reports about system temperature and power conditions. The executable commands include commands for regulating system temperature and power conditions.
The present invention differs from the prior art in that the prior art deals with the usage of a SCSI disk array to perform operations. The method of the present invention deals with the monitoring and reset operations on the SCSI bus itself to determine its operational status in regards to a clustering environment.
It is an object of the present invention to obtain server identification data from a server within a clustered system. Another object of the present invention is to obtain connection identification data from a server within a clustered system. Still another object of the present invention is to match different data fields between a server and a designated server within a clustered system.
Another object of the present invention is to compare storage usage between a server and another designated server within a clustered system. Another object of the present invention is to reset the SCSI bus on a server within a clustered system. Still another object of the present invention is to notify a user of the reasons why a failure occurred from clustering software. Still another object of the present invention is to synchronize tests in a hierarchy in order to give order to compatibility tests and resolve clustering software failures.