High-availability systems (also known as HA systems) are systems that are implemented primarily for the purpose of improving the availability of services which the systems provide. Availability can be expressed as a percentage of time during which a system or service is “up”. For example, a system designed for 99.999% availability (so called “five nines” availability) refers to a system or service which has a downtime of only about 0.44 minutes/month or 5.26 minutes/year.
High availability systems provide for a designed level of availability by employing redundant nodes, which are used to provide service when system components fail. For example, if a server running a particular application crashes, an HA system will detect the crash and restart the application on another, redundant node. Various redundancy models can be used in HA systems. For example, an N+1 redundancy model provides a single extra node (associated with a number of primary nodes) that is brought online to take over the role of a node which has failed. However, in situations where a single HA system is managing many services, a single dedicated node for handling failures may not provide sufficient redundancy. In such situations, an N+M redundancy model, for example, can be used wherein more than one (M) standby nodes are included and available.
As HA systems become more commonplace for the support of important services such file sharing, internet customer portals, databases and the like, it has become desirable to provide standardized models and methodologies for the design of such systems. For example, the Service Availability Forum (SAF) has standardized application interface services (AIS) to aid in the development of portable, highly available applications. As shown in the conceptual architecture stack of FIG. 1, the AIS 10 is intended to provide a standardized interface between the HA applications 14 and the HA middleware 16, thereby making them independent of one another. As described below, each set of AIS functionality is associated with an operating system 20 and a hardware platform 22. The reader interested in more information relating to the AIS standard specification is referred to Application Interface Specifications (AIS), Version B.02.01, which is available at www.saforum.org, the disclosure of which is incorporated here by reference.
Of particular interest for the present application is the Availability Management Framework (AMF), which is a software entity defined within the AIS specification. According to the AIS specification, the AMF is a standardized mechanism for providing service availability by coordinating redundant resources within a cluster to deliver a system with no single point of failure. The AMF provides a set of application program interfaces (APIs) which determine, among other things, the states of components within a cluster and the health of those components. The components are also provided with the capability to query the AMF for information about their state. An application which is developed using the AMF APIs and following the AMF system model leaves the burden of managing the availability of its services to the AMF. Thus, such an application does not need to deal with dynamic reconfiguration issues related to component failures, maintenance, etc.
As specified in the foregoing standards, each AMF (software entity) provides availability support for a single logical cluster that consists of a number of cluster nodes and components an example of which is shown in FIG. 2. Therein, a first cluster A includes its own AMF 24, two AMF nodes 26, 28 and four AMF components 30-36. Similarly, a second cluster B has its own AMF 38, two AMF nodes 40, 42 and four AMF components 44-50. The components 30-36 and 44-50 each represent a set of hardware and software resources that are being managed by the AMFs 24 and 38, respectively. In a physical sense, components are realized as processes of an HA application. The nodes 26, 28, 40, 42 each represent a logical entity which corresponds to a physical node on which respective processes managed as AMF components are being run, as well as the redundancy elements allocated to managing those nodes' availability.
The AIS standard also defines a service unit (SU) as a logical entity that aggregates a set of components, thereby combining their individual functionalities to provide a higher level service. A service unit can contain any number of components, but a particular component can be configured in only one service unit. Since each component is always enclosed in a service unit, from the AMF's perspective, the service unit can be considered the incremental unit of redundancy in the sense that it is the smallest logical entity that can be instantiated in a redundant manner, i.e., more than once. Another example of an AMF model including service units and components is provided below as FIG. 3.
At the leaves of this model, each component 30-36 and 44-50 has an attribute which specifies where the corresponding software installation is located. More specifically, this attribute specifies a path prefix that is used when a corresponding service unit is instantiated. However this path prefix assumes that the component is always instantiated on the same node or that the component is instantiated on a node where there is an installation of the software at a location having the same path. In current clusters, this latter characteristic is typically true, i.e., the installation path is always the same on all of the nodes. If, however, this assumption is not necessarily true, e.g., in heterogeneous clusters where some clusters may be diskless (e.g., using a RAM disk), while other nodes may use mounted disks or have local disks (or if the nodes run different operating systems), then the instantiation will fail.
Accordingly, it would be desirable to provide platform management systems and methods for HA applications which avoid the afore-described problems and drawbacks by permitting, for example, flexible service unit instantiation.