Recently, an attempt to parse (analyze) a large amount of data (referred to as big data) represented by logs, sensor information, and the like output from various apparatuses and sensors to obtain an idea (information) helpful to businesses has been made. In this analysis, a technique of analyzing a large amount of data stored in a database or the like at a specific point in time with batch processing is known.
Moreover, a complex event processing (CEP) technique of analyzing algorithm transactions in banks and the like on a realtime basis to acquire the analysis results on a realtime basis is also known. The batch processing involves processing a wide range of data (in units of several days to several years). On the other hand, the CEP technique involves processing a range of latest data (in units of several minutes). From such a difference in their nature, the batch processing is used for analyzing trends or the like whereas the CEP technique is used for invalid value detection, realtime charging control, and the like based on access logs and the like.
When an analysis server in a system that uses the CEP technique analyzes a large amount of data, it may be difficult to process all of the data in realtime since there is a limit in the processing performance of the analysis server. If the amount of data processed by the CEP technique is particularly large, the performance may deteriorate due to a delay in the data analysis process in the CEP technique.
To solve such a problem, a technique of providing a plurality of analysis servers and distributing event processing according to a relevance between events executed as CEP in the analysis servers to prevent performance deterioration is known (for example, see PTL 1, paragraphs 0039 and 0041).
The scheme disclosed in PTL 1 is a system in which an analysis server having a low processing load is selected and caused to execute processing in order to prevent performance deterioration.
Similarly, a technique of distributing load between a plurality of computers has also been considered (PTL 2). In this technique, a plurality of computers expresses the loads thereof in a usage rate of queue and the request of a client is transmitted to a computer having the lowest usage rate to prevent concentration of load.
In general, as a method for preventing a system performance from limiting the processing performance of a server, a method of improving hardware performance and a method of distributing processes as in Patent Literature mentioned above are used. The former method has a problem in that it involves service suspension during updating of hardware. Thus, data center operators and the like often use the latter method (scale-out) to improve the performance because this method can cope up with an increase in a communication traffic without suspending services. When the performance is improved according to the scale-out, how processes are to be distributed to a plurality of servers is an issue.
Scale-out is generally triggered by the lack of overall processing performance of a system. The hardware processing performance of servers or computers has been improved day by day. Thus, it is practically difficult to introduce servers during scale-out so that the introduced server has the same hardware processing performance as that of a server introduced during construction of a system. In a system constructed with a plurality of servers having different processing performance, when a round-robin method of allocating process requests sequentially is used, for example, since the servers have different specifications, a difference occurs in the balance of loads of the servers and the time required for processing a request is different from one server to another.
On the other hand, in a system including various constituent elements such as one or more servers and one or more networks connecting the servers, when a fault, a failure, or the like occurs in these constituent elements, the system may be suspended or may provide incorrect results. Thus, the reliability of the system may decrease.
To solve this problem, a redundant scheme is known as one of the schemes for improving reliability (that is, providing fault tolerance). In a redundant scheme, a plurality of identical systems is provided and all systems execute the same process in parallel so that, even when a fault occurs in a certain system, the other systems continue operating. In this way, reliability is secured. Here, a scheme in which, when the same result is not attained even when the plurality of systems performs the same processing due to a fault (for example, a computation error due to an internal fault of a CPU), a result that is considered to be correct is output according to the majority decision is known.
For example, PTL 3 discloses a system which receives a processing request and transmits the result obtained by a computer that processes the request. In this system, the same processing request is transmitted to all of the plurality of computers, the processing results are received from the plurality of computers, the plurality of received processing results are compared, and a correct processing result is output.
By using such a scheme, it is possible to prevent different results from being output due to a fault or a failure in a computer and a difference in the software version. Thus, the system can operate with improved reliability.