Distributed computation between computation nodes may be used to improve computation capacity or performance. Many distributed computing mechanisms are known (e.g., a system implemented and operated under a programming language X10).
The distributed computing mechanism operated under the programming language X10 includes a home node and at least one remote node. The home node instructs the distributed computation by creating activities in the home node and the remote nodes. An activity can create other activities in these nodes. The home node waits for the termination of all activities. Hereafter, it is noted that the term “home node” and the term “remote node” refer to respective computing nodes including computers playing roles therefor. The home node manages whole computations distributed to the nodes and generates the final result of the computation. Each of the nodes computes something according to the role allocated by the program by generating an activity or activities and returns the result thereof to the home node after the computation thereof has finished successfully.
In the above distributed computing mechanism operated under the X10 programming language, a fault of a particular node leads directly to the failure of computation. Recently, a programming language Resilient X10 has been proposed. The Resilient X10 programming language can overcome the fault of a particular node and make it possible to complete the objective computation using nodes alive at that time, by storing necessary data to check the termination of all activities into a so-called “resilient store.” The computing mechanism operated under the Resilient X10 programming language is reviewed, for example, in D. Cunningham et al., “Resilient X10: Efficient Failure-Aware Programming,” Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '14), pages 67-80, August 2014 (hereinafter “Cunningham”); S. Crafa et al., “Semantics of (Resilient) X10,” Proceedings of the 28th European Conference on Object-Oriented Programming (ECOOP '14), July/August 2014; and K. Kawachiya et al., “Writing Fault-Tolerant Applications Using Resilient X10,” X10 Workshop June 2014.
As known distributed computing mechanisms, Japanese Patent JPH08314875 A, entitled “Cooperative Distributed Processing Method, Distributed Shared Memory Monitoring Device, Distributed Shared Memory Network Tracking Device and Distributed Shared Memory Network Setting Supporting Device” (hereinafter “Japanese Patent JPH08314875 A”) discloses the distributed computing mechanism for backup lost functions of the faulted distributed node. The system in Japanese Patent JPH08314875 A uses a distributed shared memory on which a status monitor table and data shared in each of the distributed nodes are placed. When a particular node suffers a fault, the lost functions are replaced among the normal nodes.
Japanese Patent JPH0612352 A, entitled “Method and Device for Processing Information and Information Processing System,” (hereinafter “Japanese Patent JPH0612352 A”) discloses an information processing method for maintaining the consistency of data. The system in Japanese Patent JPH0612352 A acknowledges data change on the host and the host acknowledges the change of the data to a sender of the data. When the sender host suffers a fault, the host sends the fault-acknowledgement to the other hosts in the network.
However, there is still a need in the art to improve computation performance while keeping the excellent fault-tolerance of the Resilient X10 programming language.