This invention generally relates to embedded software packages in distributed computer systems and more particularly to an improved system and method for system recovery and migration of services in the event of a failure.
Distributed computer systems store enormous amounts of information that can be accessed by users for identification and retrieval of valuable documents that contain control, data, text, audio and video information. A typical example of a distributed system (100) is shown in FIG. 1. A distributed computer system consists of computer nodes (104a to 104n and 108a to 108z) and a communication network (102) that allows the exchange of messages between computer nodes. The communication network (102) may be any of the following: a local area network (LAN), a wide area network (WAN), a corporate intranet, the Internet, a wireless network, a cabling network or equivalents. Multiple storage devices (106a to 106n and 110a to 110z) store data files for the multiple nodes of the distributed system. Storage devices (106a to 106n) are local storage for the nodes (104a to 104n); storage devices (110a to 110z) are global databases which are accessible by nodes (108a to 108z); these are considered to belong to a storage or disk xe2x80x9cfarmxe2x80x9d (112) of shared non-volatile memory. These nodes work together to achieve a common goal (e.g., a parallel scientific computation, a distributed database, control of multiple robots in a manufacturing plant or a parallel file system). In particular, nodes (108a to 108z) act as control servers in a manufacturing plant; they control and communicate with local nodes (104a to 104n) in order to effect control of an industrial process or device. Such devices, which are not shown in FIG. 1, include robots, printers, video devices, audio devices, tape devices, storage devices or their equivalents.
FIG. 2 (200) illustrates the composition of a processing server node (202) utilized by some distributed processing node implementations. As shown, the node (202) contains a software package (204a) that effects control and communication with the devices mentioned above. Further, package (204a) comprises other services (208a) and an embedded monitor software subroutine (206a). Other services (208a) are the part of the package (204a) that performs device communication and control. Monitor subroutine (206a) monitors the functionalities of the package (204a). The package (204b) as shown in an expanded diagram includes other services (208b) that typically executes:
Mount/home1
Export (share)/home1
Service File I/O Requests for/home1
Start Monitor
and a monitor subroutine (206b) that typically executes monitoring functions including:
Periodically Verifying I/O Daemon Responsiveness
If necessary, Restarting Daemons and Re-Export/home1
However, a problem arises in that this embedded monitor subroutine (206a), because it is embedded in the package it is monitoring, has no knowledge of other packages. So, if there is a package running on each node of a multiple node cluster, and one node fails, its package must move to another node. If both of these packages contained embedded monitors that were monitoring the respective packages, and a problem occurred that required corrective action, they would compete against each other by trying to restart resources. For example, whichever starts a recovery process first would attempt to restart some process. This process restart in turn is detected by a second monitor as a failure since the state of the first package and monitor is unknown to the second monitor. Thus, the second monitor would now attempt to restart its process and the errors would accrue successively.
A typical processing node cluster software that uses this implementation is Hewlett Packard""s (Palo Alto, Calif.) MC/ServiceGuard. In this software, the whole purpose of the packages is to form a collection of services that can move from one host or machine to another. This migration of services can be precipitated by a total nodal failure (i.e., an equipment failure like a node caught on fire), the result of planned maintenance on one of the nodes or for the purpose of load balancing. The services contained within the nodes are grouped into packages as previously described; a given package is any combination of programs or data. Although service migration occurs for some failures, not all failures actually necessitate a migration of services; rather, a program which has died may be restarted by an automated watchdog process for example. A package monitoring program automatically performs this watchdog process in the Hewlett Packard implementation; there, the monitor is launched by the package it intends to monitor.
In addition, the cluster software is controlled by an operating system containing a network file system (nfs, SUN Microsystems, Palo Alto, Calif.). This network file system comprises a plurality of processes including, but not limited to: a) nfsd, xe2x80x9cnfs daemonxe2x80x9d, b) rpc mountd, xe2x80x9cremote procedure call mount daemonxe2x80x9d, and c) rpc statd, xe2x80x9cremote procedure call status daemonxe2x80x9d. These are all part of the operating system that allows xe2x80x9cnfsxe2x80x9d (SUN""s network file system) to work. These processes are the ones that are monitored for a file sharing package. However, it is understood that the monitored processes are anything that is required for a given package to perform its functionality.
The above architecture present two major problems, namely, a) the monitor can""t be terminated without stopping the package it is monitoring, and b) only one package with similar attributes can be running at any given time on a node. First, if there are any adjustments required in the monitor itself (timing issues, retries or the equivalent) the client services provided by the package running the monitor must be interrupted. Because the goal of MC/ServiceGuard is to provide high availability for server resources, stopping a package even for a short period of time is undesirable. Second, if there were two or more packages running with similar attributes each of these could affect processes that the other is watching. As a result, an endless loop of erroneous attempts at some corrective action prevents one server from taking over the resources of another server. Prior attempts to resolve this problem include maintaining a normally idle standby server or dedicating a functionality to a specific server. However, neither of these choices is cost effective nor permits distributed dissemination of packages. What is needed is a hardware or software implementation that solves the problems of: a) that the monitor can""t be updated since it can""t be terminated without stopping the package it is monitoring, and b) only one package with similar attributes can be running at any given time on a node.