1. Field of the Invention
This invention relates to devices and methods for shared process control of computer systems which are configured using multiple nodes.
This application is based on Patent Application No. Hei 10-17936 filed in Japan, the content of which is incorporated herein by reference.
2. Description of the Related Art
Conventionally, the computer systems are configured using multiple nodes. In order to provide the same service at each node, the resident process (e.g., various kinds of server processes) is set to operate fixedly at a specific node. When such a specific node receives a service request from another node by means of the nodes communications, it executes the corresponding service. In some case, the node returns result of execution to a node which is a request source according to needs.
For example, the above is described on pages 3-16 and pages 39-41 of the paper entitled xe2x80x9cACOS Software, ACOS-4/XVP PX, JES-PX Explanation and User Manualxe2x80x9d, which is the sixth edition published on September of 1996 by NEC Corporation.
The aforementioned computer system suffers from a problem that if a failure occurs on the node at which the resident process is presently operating, the service must be stopped. Because, if the failure occurs on the node at which the resident process is presently operating, the resident process is compelled to stop.
Another type of the computer system is designed such that each node executes the service by performing exclusion on the resource while starting the same program at the nodes simultaneously. However, this type of the computer system suffers from a problem that the redundant resource should be required to start the same program at each of the nodes.
It is an object of the invention to provide a device for shared process control of a computer system that is capable of providing the service at each of nodes without using the resource so much unless a failure occurs on the node presently activated.
It is another object of the invention to provide a method for the shared process control of the computer system.
It is a further object of the invention to provide a machine-readable media storing programs and data that cause the computer system to perform the shared process control.
A computer system incorporating the device and method for shared process control in accordance with this invention is configured by multiple nodes, each coupled with a recording medium and a shared memory, which contains a shared process control table and multiple shared processes. Herein, each of the nodes contains a shared process dispatcher and a node failure recovery unit, while the shared process control table stores multiple shared process control entries, each of which further contains a process control block and a node flag representing a node that presently occupies the corresponding shared process.
At issuance of a service request for a present node within the multiple nodes, a decision is made as to whether the shared process is occupied by other nodes. If the shared process is occupied by the other nodes, the shared process dispatcher of the present node controls issuance of the service request to wait. If not, it controls the node flag to represent an even that the shared process is occupied by the present node while controlling the shared process to process the service request. After completion in processing of the service request by the shared process, the shared process dispatcher of the present node clears the node flag to place the process control block in a state to wait for a next service request. In addition, it also communicates with the shared process dispatchers of the other nodes to inform them of an event that the shared process is released.
At occurrence of a failure on a xe2x80x9cfailurexe2x80x9d node within the multiple nodes, the node failure recovery units of other nodes which are normal make mediation to designate one of them as one that actually copes with the failure. Thus, the designated node failure recovery unit clears the node flag while placing the process control block in a state to wait for a service request. In addition, it communicates with the node failure recovery units of other nodes, which are normal, to inform them of an event that the shared process is capable of receiving a service request.