Commonly, pluralities of computers, databases, printers and other computing or computing related devices are often established in clusters or as members or elements of a network, hereafter, collectively “Clusters”. Cluster are often defined as parallel or distributed systems that consist of a collection of interrelated whole computers, that is utilized as a single, unified computing source. As such, it is commonly appreciated that Clusters commonly consist of computing devices (i.e., Nodes) and other peripheral devices to which access thereto may be controlled by particular Nodes. Clusters enable information system managers to share a computing load over several systems, thereby enhancing the capabilities of the system as a whole, minimizing the adverse effects of failures of certain Nodes or devices on a Cluster, and allowing for the expansion of capacity and capabilities of a given Cluster by adding additional or different Nodes and/or devices to the Cluster. Often Clusters are designed to be Highly Available (i.e., the resources associated with the Cluster have a minimum operating reliability, often measured in the 99.99% range and better). As is well known in the art, High Availability implies that faults in services provided by a Cluster will be detected within a specified time parameter and restored within a specified time parameter.
As Clusters have increased in use, size and complexity, products have been developed that manage such Clusters. One such product is SERVICEGUARD®, developed by HEWLETT PACKARD®. In order to provide the desired High Availability, many of the Cluster management products today utilize a fail-over mechanism when a fault is detected on a Node of a Cluster. The fail-over mechanism basically provides that when a Node on which an application is running fails, for whatever reason, the application is “failed-over” to another Node. In the fail-over process, essentially, the application is restarted on a new Node within a given amount of time. Cluster managers often utilize “Packages”, that contain all the resources that an application might need in order to run on a Node. These Package essentially provide the information needed by the fail-over Node in order to restart the application. Examples of information a Package may contain include information relating to the type of storage an application utilizes, the IP addresses that it uses for clients that connect to the application and other information that basically enables the application to run on the system. Cluster management tools, such as SERVICEGUARD®, often are capable of monitoring Packages and starting them as necessary in order to activate an application on a Cluster. Further, when a fail-over occurs, it is the Package that is usually provided to the fail-over Node by the Cluster manager.
Further, in addition to providing a container of resources needed to fail-over an application, Packages also provide other benefits. Since all the resources are in one container, a Package makes it easy for Cluster managers to monitor and manage the associated application. Cluster managers, via the Package, can see what components are functioning optimally and also extend the application in interesting ways. For example, if storage devices need to be added for use with a given application, the needed devices can merely be added to the Package and then be available to the application regardless of the Node on which the Package is currently running.
However, as Clusters become more common, many new applications are being developed which are essentially, “Cluster Aware”, i.e., they appreciate the fact that the application is not merely running on a single computer and instead is running on a Node of a Cluster. With Cluster Aware applications, it is often undesirable to require an application to fail-over to a new Node whenever a fault is detected, because such fail-over is often inefficient, and needed data is often lost. Further, many of these Cluster Aware applications desire to implement instantiations of themselves on every Node in the Cluster. As such, they often do not behave as non-Cluster aware applications when a fail-over occurs because of the instantiations of themselves often are not amenable to the commonly utilized fail-over processes. Examples of Cluster aware applications include volume management services, and ORACLE® database applications, where the database is implemented on multiple Nodes for performance purposes.
In particular, ORACLE® database operations are commonly implemented on a Cluster such that each Node has a specific instance of the ORACLE® application. Each Node is capable of implementing the application but does not know that the application also exists on other Nodes of the Cluster. As such, the application is commonly implemented simultaneously on multiple Nodes, thereby wasting and inefficiently utilizing Cluster resources. Thus, a need exists for a process which enables Cluster aware applications to maintain High Availability without requiring the application to fail-over to other Nodes.
One way to provide for the multiple instantiations of applications on multiple Nodes in a Cluster manager system is to hard-code the multiple instantiations into the source code for the Cluster management software. This hard-coding enables the Cluster management software to appreciate, at start-up, that multiple copies of an application are to be implemented on the Nodes of the Cluster and to automatically start up the multiple instantiations of the applications on the multiple Nodes. While this process works for existing applications (whose needs are known), it is not very adaptable since any new application desiring to be Cluster Aware requires recoding of the source code for the Cluster management software. Thus, there is a need for a process that provides High Availability to Cluster Aware applications while being agnostic as to the particular application being implemented.