The present invention is generally related to cluster computing systems, and more particularly, is related to providing diagnostic audits for cluster computer systems.
Within the computing industry, there is an ongoing demand for information technology (IT) solutions that provide cost-effective, flexible, and fault-tolerant software applications to multiple computer users within a cluster computer system. A cluster computer system typically refers to a collection of computers, servers, or workstations interconnected via a communications network for the purpose of reliably providing a mission-critical software application to clients supported by the collection of computers, servers, or workstations. In general, the computers that comprise a cluster computer system work collectively as an integrated computing resource to provide the mission-critical software application. Cluster middleware is designed to protect the cluster computer system from a wide variety of hardware and software failures that may affect the provisioning of the mission-critical software application. For example, cluster middleware is responsible for providing what is referred to in the art as a Single System Image (SSI) of the cluster computer system by ensuring that the resources on computer A will be available on computer B in the event of some hardware or software failure related to computer A. In other words, the cluster middleware glues together the operating systems of each computer within the cluster computer system to offer reliable access to the mission-critical software application. Typically, cluster middleware performs a variety of tasks related to the cluster computer system, such as, for example, checkpointing, automatic failover, recovery from failure, and fault-tolerant support among all of the computers in the cluster computer system.
Notwithstanding the existence of robust cluster middleware, there is also a substantial demand in the cluster computer system environment for diagnostic tools and services for monitoring the consistency and operational capability of the cluster computer system. Currently, diagnostic services for cluster computer systems are performed manually by service personnel. For example, service personnel have to first run a series of data collection tools to gather data related to the cluster computer system. In situations where different computers within the cluster computer system have different operating systems, the data collection tools typically have to be performed for each type of operating system. After the data related to the cluster computer system is collected, the service personnel have to perform a manual analysis of the data to ensure that there is consistency between the corresponding computers for each type of operating system. This manual analysis may be extremely time-consuming and expensive, and because the analysis is manual, the diagnostic service is susceptible to error and variations between personnel performing the analysis. Furthermore, manual analysis becomes increasingly problematic as the number of computers in the cluster computer system increases. As more and more data is gathered by the collection tools, it becomes increasingly difficult for service personnel to perform a meaningful diagnostic audit. For instance, instead of proactively providing meaningful diagnostic information by comparing the relative consistency of each computer within the cluster computer system, service personnel are confined to reactively explaining the differences between various computers within the cluster computer system.
Thus, there is a need in the industry to address these deficiencies and inadequacies.
The present invention provides systems and methods for providing an automated diagnostic audit for cluster computer systems.
Briefly described, in architecture, one of many possible implementations of a system for providing an automated diagnostic audit for a cluster computer system comprises: means for receiving information associated with the cluster computer system, the information comprising a plurality of system configuration parameters for each of a plurality of nodes in the cluster computer system; means for defining a plurality of system configuration categories associated with the plurality of system configuration parameters; means for defining a threshold benchmark for each of the plurality of system configuration categories, each of the plurality of threshold benchmarks based on a predefined set of rules; means for associating each of a portion of the plurality of system configuration parameters for each of the plurality of nodes with one of the plurality of system configuration categories; and means for generating audit information, the audit information based on a comparison of each of the plurality of system configuration parameters for each of the plurality of nodes to the threshold benchmark for the associated system configuration category. The system may further comprise means for providing the audit information to a network management entity associated with the cluster computer system.
Another system for providing an automated diagnostic audit for a cluster computer system comprises: means for collecting information associated with the cluster computer system, the information comprising a plurality of system configuration parameters for each of the plurality of nodes in the cluster computer system; means for providing the information associated with the cluster computer system to an application service provider; and means for receiving diagnostic audit information generated by the application service provider. The diagnostic audit information may correspond to at least a portion of the information associated with the cluster computer system. Furthermore, the diagnostic audit information received by the system may be determined by: defining a plurality of system configuration categories associated with the plurality of system configuration parameters; defining a threshold benchmark for each of the plurality of system configuration categories, each of the plurality of threshold benchmarks based on a predefined set of rules; associating each of a portion of the plurality of system configuration parameters for each of the plurality of nodes with one of the plurality of system configuration categories; and comparing each of the portion of the plurality of system configuration parameters for each of the plurality of nodes to the threshold benchmark for the associated system configuration category.
The present invention may also be viewed as providing one or more methods for providing an automated diagnostic audit for a cluster computer system. Briefly, one such method involves the steps of: receiving information associated with the cluster computer system, the information comprising a plurality of system configuration parameters for each of the plurality of nodes in the cluster computer system; defining a plurality of system configuration categories associated with the plurality of system configuration parameters; defining a threshold benchmark for each of the plurality of system configuration categories, each of the plurality of threshold benchmarks based on a predefined set of rules; associating each of a portion of the plurality of system configuration parameters for each of the plurality of nodes with one of the plurality of system configuration categories; and generating audit information, the audit information based on a comparison of each of the plurality of system configuration parameters for each of the plurality of nodes to the threshold benchmark for the associated system configuration category.
Briefly, another such method for providing an automated diagnostic audit for a cluster computer system involves the steps of: collecting information associated with the cluster computer system, the information comprising a plurality of system configuration parameters for each of the plurality of nodes in the cluster computer system; providing the information associated with the cluster computer system to an application service provider; and receiving diagnostic audit information generated by the application service provider, the diagnostic audit information corresponding to at least a portion of the information associated with the cluster computer system. The diagnostic audit information received by the system may be determined by: defining a plurality of system configuration categories associated with the plurality of system configuration parameters; defining a threshold benchmark for each of the plurality of system configuration categories, each of the plurality of threshold benchmarks based on a predefined set of rules; associating each of a portion of the plurality of system configuration parameters for each of the plurality of nodes with one of the plurality of system configuration categories; and comparing each of the portion of the plurality of system configuration parameters for each of the plurality of nodes to the threshold benchmark for the associated system configuration category.
Other systems, methods, features, and advantages of the present invention will be or become apparent to one with skill in the art upon examination of the following drawings and detailed description. It is intended that all such additional systems, methods, features, and advantages be included within this description, be within the scope of the present invention, and be protected by the accompanying claims.