1. Field of the Invention
The present invention relates to a failure tolerant transaction processing system. More specifically, the present invention relates to a transaction processing method, a processing system and a control program for constructing a failure tolerant transaction system.
2. Background Art
Modern business application systems have a problem in that a transaction process temporally stops due to a system failure in a server or the like. Therefore, for example, Patent Document 1 discloses a method and a system for diagnosing and performing self-repair the server failure in a server farm (a set of a group of servers).
[Patent Document 1] Published Unexamined Patent Application No. 2004-110790
From the viewpoint of a user of the system, it does not matter if only a transaction process completes within a time period near that of the normal occasion (two or three seconds in many systems) even if a system failure occurs. In a general system, it is difficult to have a down time finished within the abovementioned time period.
A failure tolerant mechanism of the modern business application system is based on failure-detection and take-over-processing. That is to say, a method for instantaneously switching in response to detection of a failure is adopted. The failure-detection, however, generally requires ten seconds to a few minuets. This is because the failure-detection is performed by determining whether message exchange between an external computer and an objective server by communication jams up or not.
Even if the objective server is normally operating, however, the load temporally becomes high and message exchange may jam up. Therefore, if message exchange is attempted several times in a certain time period but the message exchange cannot be performed normally, it is determined that a failure occurs in the objective server. If the reply waiting time or the number of attempts is set too small, the failure detection mechanism determines that a failure occurs without regard of normal operation of the objective server, and starts the take-over-processing at a backup server. As a result, at least about ten seconds are reserved for checking whether it is alive or not.
As such, the failure-detection and the take-over-processing cannot be performed in a time period too short for a user to recognize the system failure. That is essential to the failure-detection. In order to shorten a time period for checking whether it is alive or not, a network dedicated for message processing of checking whether it is alive or not needs to be prepared, and a processor dedicated for checking whether it is alive or not at the objective server further needs to be prepared, and a mechanism for checking whether an operating system and processes thereon are normally operating or not is needed. That requires hardware and the operating system to be changed, but the current open platform environment cannot meet the requirements.
Therefore, the present invention is adapted to solve the abovementioned problems, and intends to provide a new transaction processing method for enabling a process to be resent to a backup system without waiting for failure-detection of a system, if no reply is received for a certain time period.
In the present invention, conditions below are mainly assumed.
(2F+1) sets comprising a data management mechanism and one or more servers for updating data in the mechanism (hereinafter the set is referred to as a server farm) are present (“F” is a natural number). A client for issuing a transaction request to the server farm sets is present. Between respective server farms and between a client and each of the server farms are connected via a network with a plurality of data sending channels which is made redundant (multiplexed). The same transaction request can be inputted from a single client for multiple times. If the client cannot receive a reply from a server farm for a certain time period, it sends the same transaction request to another server farm. A network and a server farm may encounter a malfunction at any time, but there are only F sets of server farms that encounter a malfunction caused by a failure.
FIG. 1 shows a basic form of a system assumed by the present invention (corresponds to a case of F=1).
As shown, each of the server farms 1-3 has a plurality of application servers (1c-3c), with a database (DB1a, 2a, 3a) and a DB server for managing each of the DBs (1b, 2b, 3b) being present in each server farm. Clients 4, 5 requesting a transaction access a database needed by the transaction via the application servers and the DB server in each of the server farms.
FIG. 2 shows the minimum configuration assumed by the present invention. An arbiter 8 in the figure is a special form of a server farm, and is a server farm only for normally operating the other server farms, neither performing a transaction process nor having a DB. For example, even if F=1, the present invention has many cases of needing three server farms but not needing to make a DBMS (Data Base Management System) triple. As the present invention has a distributed agreement protocol based on a majority rule as basis, it needs a server to perform voting for a majority rule also in a dual-redundant case. It is the arbiter 8 which does that.
Although the server farm will be described with (2F+1) sets (odd-numbered sets) as basis below, it can be applied to a system of 2F sets (even-numbered sets) of server farms by including the arbiter 8 in a configuration like that.
The problems of the present invention are to ensure three points below under such conditions.
<Problem 1> Even if the same transaction is redundantly executed for a plurality of numbers of times, only one execution succeeds a commit (completes a commit).
<Problem 2> A transaction which completed a commit is executed based on the latest data.
<Problem 3> If a server farm stops due to a failure, the transaction processes can be continued without being stopped for a long time.