Distributed systems are known having a control workstation and a plurality of nodes, all joined by a switch. One of the nodes is designated as a primary node for initialization and recovery of the switch network. A backup node is provided to take over the responsibilities of the primary node in the event that the primary node fails in a process known as "failover". If an operator commands that the primary node designation be assigned to a new operator selected node at or about the same time as the primary node failure, the backup node may first take over the duties of the primary node, only for the operator command to immediately change the primary node to the operator selected node. The present invention provides for the synchronizing of operator initiated commands with the failover process for more efficient operation of the backup daemon.
A system is also known in which a command is enqueued on a work queue of a daemon if the daemon is asleep, and failed if the daemon is busy to prevent the daemon from repeating work it is in the process of performing.
U.S. Pat. No. 5,408,646 issued Apr. 18, 1995 to Ikeda et al. for CIRCUIT AND METHOD FOR DETECTING A FAILURE IN A MICROCOMPUTER discloses a circuit in which a watch-dog timer monitors an internal state of a microcomputer for detecting the failure of the microcomputer.
U.S. Pat. No. 5,463,763 issued Oct. 31, 1995 to Kubo for APPARATUS AND METHOD FOR SUPERVISING MULTIPROCESSOR COMMUNICATIONS USING MESSAGES TRANSMITTED BETWEEN PROCESSORS IN A CIRCULAR FASHION discloses a multiprocessor system including a plurality of processors and detecting the occurrence of a failure in a node according to whether or not a normal operation message indicating the node processor is operating normally is transmitted within a preset monitoring time.
U.S. Pat. No. 5,473,599 issued Dec. 5, 1995 to Li et al. for STANDBY ROUTER PROTOCOL discloses a system and protocol for routing data packets from a host on a LAN through a virtual address belonging to a group of routers. A standby router backs up an active router so that if the active router becomes inoperative, the standby router automatically takes over for the active router in emulating a virtual router.
U.S. Pat. No. 5,485,465 issued Jan. 16, 1996 to Liu et al. for REDUNDANCY CONTROL FOR A BROADCAST DATA TRANSMISSION SYSTEM discloses an apparatus for a broadcast communication network. The absence of a packet of information on the primary link within a predetermined period of time of receipt of a corresponding packet on a secondary link generates an error signal to change the count in a counter in a predetermined direction. When the count reaches a predetermined number the secondary is switched to become the primary link.
U.S. Pat. No. 5,544,077 issued Aug. 6, 1996 to Hershey for HIGH AVAILABILITY DATA PROCESSING SYSTEM AND METHOD USING FINITE STATE MACHINES discloses a high availability data processing system including a primary processor at a first node, and a first standby processor at a second node of the communications network. The second node has a first event driven interface therein coupled to the network for detecting an alarm signal. When a characteristic pattern is detected by the event driven interface, switch over logic in the first standby processor invokes primary status in the first standby processor.
IBM Technical Disclosure Bulletin, Vol. 27, No. 8, January 1985 by Goyal et al. for SUPERVISOR RECOVERY IN RING NETWORKS discloses the use of an I-AM-ALIVE message within a specified time period to detect if the primary supervisor has failed to begin an election process to select one of the other processors to undertake the supervisory role.