1. Field of the Invention
The present invention relates to a state transition control technique for a distributed system wherein a distributed transaction is executed among a plurality of sites on the Internet, for example.
2. Description of the Related Art
Recent developments of the Internet have begun to enable electric commerce between a company and a customer, or between companies, via the Internet. These commerce can be theoretically considered as “transaction”. In the case where a transaction is done between a plurality of independent resources, such a transaction is called “distributed transaction”.
In general, the transaction needs to be controlled keeping so-called ACID characteristics. Here, ACID is an acronym for the four characteristics: atomicity, consistency, isolation and duration. The distributed transaction, in many cases, is controlled by a method called “2 Phase Commit” to keep the ACID characteristics. For example, the 2 Phase Commit is adopted in the WS-Transaction specification for the WebService that enables transactions over the Internet.
The 2 Phase Commit ensures the ACID characteristics of the distributed transaction. However, there is such a problem that when failure occurs in resources relating to a transaction, the transaction may possibly be blocked. In this context, “blocking of a transaction” means the continuance of a situation in which updating processes cannot be committed nor aborted for all resources relating to a transaction. If a certain transaction is blocked, the transaction keeps holding locks of associated resources. Consequently, it is quite possible that other transactions wait forever the transaction to release the locks, and fall into a deadlock condition.
This nature of 2 Phase Commit, that is, the transaction may be blocked in case of resource failure, is an inherent problem of the 2 Phase Commit. However, conventionally, this has not been considered as a serious problem in the case where the 2 Phase Commit is used within one system. In case of failure, the administrator of the system has to eliminate a cause of failure at first. It is the primary task. After he achieves recovery from the failure, the blocking of the transaction is easiliy released. In this manner, as far as the 2 Phase Commit is used within the single system that is operated and managed in a centralized manner, the blocking of the transaction can be treated together with the failure of the system and thus the blocking nature of the 2 phase commit is not so serious.
However, in the case where the transaction is performed between a plurality of companies, the blocking of the transaction becomes a serious problem. For example, in the case where three companies, company A, company B and company C, are associated with a certain transaction, suppose that a failure occur on the site of company A, while no failure occurs in company B nor company C. In this case, blocking of the transaction can be triggered by the failure on the site of company A, and a chain of deadlocks may occur. As a result, other transactions even independent of company A may also be blocked on the site of company B or C. In this situation, it is not tolerable for company B and company C to wait for recovery on the site of company A while being unable to do anything about it. As shown above, in the case where 2 Phase Commit is used between independently operated/managed systems, blocking of transaction can occur due to failure on one site, and transaction on other sites can be blocked. The solution to this problem is already known. It is 3 Phase Commit Algorithm.
The point of change from 2 Phase Commit to 3 Phase commit is that the “prepared” state in 2 Phase Commit is divided into two states (which are referred to as W-state and P-state in this specification). And the notification for “commit” from a coordinator to a resource is divided in two stages in the order of W-state→P-state→Commit.
In 3 Phase Commit, even if failure occurs in the coordinator or resources during operation, it is possible to determine whether to commit or abort by collecting the states of the updating processes of nonfaulty resources. This procedure is referred to as “termination protocol”.
According to the termination protocol of the 3 Phase Commit, in case of failure of the coordinator, a new coordinator is started up, and the states of the update processes of the nonfaulty resources are collected. If these states are all non-committing-states, the transaction is aborted, or If these states are all committing-states, the transaction is committed. Here, the term “non-committing-state” in this context refers to one of the updating-state, abort-state and W-state. The term “committing-state” refers to one of the P-state and commit-state. Otherwise when W-state and P-state exist in a mixed fashion. The new coordinator executes, once again, notifications for “commit” for the nonfaulty resources in two stages in the order of W state→P state→Commit.
when the faulty resource is recovered after the termination protocol has been executed, the updating process of the resource should be committed or aborted in accordance with the determination in the termination protocol. Attention should be paid to the fact that if the updating process of recovered resource had already be committed or aborted prior to the occurrence of failure, the algorithm of 3 Phase Commit ensures that this is consistent with the determination of the termination protocol.
As has been described above, the 3 Phase Commit is an algorithm in which the blocking of the transaction can be prevented by the termination protocol. But for the correct operation of the termination protocol, the following conditions need to be satisfied:
(1) Failure of a resource can be detected.
(2) A resource, which is once decided to have failure, cannot participate in the subsequent termination protocol.
(3) When a new coordinator is started up, it must be unique.
However, in particular, in the case of an ordinary server in which communication is executed via a network without reliability such as the Internet, none of the above conditions (1) to (3) is satisfied. Specifically, the condition (1) is not satisfied because access disability due to network failure cannot be distinguished from failure of a server. The condition (2) is not satisfied because slowdown due to a high load on a server cannot be distinguished from failure of the server. The condition (3) is not satisfied because, if network partitioning occurs, the termination protocol may be executed independently in each of the partitions. Thus, there is a problem that the 3 Phase Commit cannot be applied in the Internet environment.
More generally speaking, the network environment without reliability, such as the Internet, falls in the category of asynchronous network model, and there is such a problem that the termination protocol of the 3 Phase Commit does not correctly operate in asynchronous network model.