Stream data processing which realizes real time processing of high-rate data attracts attention in an environment marked by an advance in technology for analyzing, in real-time, information continuously generated at high rate, such as automatic stock exchange, high-level traffic information processing, and analysis of click stream, and acting quickly. The stream data processing is a general-purpose middleware technique that can be applied to various data processing, so that the stream data processing makes it possible to use data in the real world for business in real-time while responding to a rapid change in a business environment in which there is no time to establish a system for each project. The principle and implementation method of the stream data processing is disclosed in Non-Patent Literature 1.
As described above, the stream data processing is real-time processing of high-rate data, so that output data of the processing result is continuously generated at high rate. Therefore, it is required that the period of time taken from when a failure occurs to when the result can be outputted again is shorter than one second. An effective method for realizing such a recovery time is a use of a duplex configuration in which two servers that perform the same processing are prepared and when a failure occurs in a server that outputs results to an application, the other server takes over the role of outputting the results.
When a failure occurs in a duplex configuration, a single system operation occurs in which only a single server operates, so that if a further failure occurs, the system halts. To avoid the system halt, it is necessary to add a standby system server to an in-use system server operating alone and restore the duplex configuration. At this time, the execution state of the added standby server is an initial state, so that the execution state of the in-use system server needs to be reproduced in the standby system server.
As a first method for reproducing the execution state in the standby system server, Non-Patent Literature 2 discloses an Upstream Backup method in which input streams are backed up during a normal operation and when the standby system is added, the standby system server executes the backup data to catch up with the execution state of the in-use system server. The longer the processing time is, the larger the storage capacity of a disk or a memory necessary for the backup is. However, it can be assumed that the capacity is within a certain range because of the reason described below.
In the stream data processing, it is possible to use a window operation that extracts a most recent portion from a data series. The definition of the window operation is disclosed in Non-Patent Literature 3. For example, when applying an aggregation operation that calculates an average of the data extracted by the window operation with a time width of one minute, an operation of calculating a moving average of one minute occurs. In this example, if continuously flowing data for one minute, data in the window is renewed, so that if the standby system server which starts from the initial state processes data of most recent one minute, the standby system server has the same execution state as that of the in-use system server.
As a second method for reproducing the execution state in the standby system server, there is a method in which, when the standby system is added, the in-use system is temporarily stopped to cause the execution state to be static, and the execution state is transmitted to the standby system server as a snapshot. The method for setting the execution state to be static and transmitting a snapshot is widely used for a database and a transaction system. Patent Literature 1 discloses a standby system addition method which uses the static execution state in an in-memory database.