This invention relates to a stream data processing method which is performed in a case where a portion of stream data arrives with a delay behind a time of generation thereof.
There has been an increasing demand for a data processing system which carries out real-time processing for data continuously arriving at a database management system (hereafter, referred to as “DBMS”), which carries out processes for data stored in the storage system. For example, in a system for trading stocks, how fast the system can react to changes in stock prices is one of the most important objects, and a method such as the one carried out by a conventional DBMS, in which stock data is once stored in a storage system, and then the stored data is searched for, cannot immediately respond in correspondence with the speed of the changes in stock prices, and may result in losing business chances. For example, though U.S. Pat. No. 5,495,600 discloses a mechanism which issues stored queries periodically, it is difficult to apply this mechanism to the real time data processing for which it is important to execute a query immediately after data such as stock prices is input.
Data which continuously arrives is defined as stream data, and there has been proposed a stream data processing system as a data processing system suitable for the real-time processing for the stream data. For example, R. Motwani, J. Widom, A. Arasu, B. Babcock, S. Babu, M. Datar, G. Manku, C. Olston, J. Rosenstein, and R. Varma: “Query Processing, Resource Management, and Approximation in a Data Stream Management System”, In Proc. of the 2003 Conf. on Innovative Data Systems Research (CIDR), (online), January 2003, (retrieved on Oct. 12, 2006), Internet URL <http://infolab.usc.edu/csci599/Fall2002/paper/DS1_datastream managementsystem.pdf> discloses a stream data processing system “STREAM”.
In the stream data processing system, first, queries are registered to the system, and the queries are executed continuously each time data arrives, which is different from the conventional DBMS. The above-mentioned STREAM employs an idea referred to as sliding window, which partially cuts stream data for efficiently processing the stream data to thereby impart lifetime to the data. As a preferred example of a query description language including a sliding window specification, there is a continuous query language (CQL) disclosed in R. Motwani, J. Widom, A. Arasu, B. Babcock, S. Babu, M. Datar, G. Manku, C. Olston, J. Rosenstein, and R. Varma: “Query Processing, Resource Management, and Approximation in a Data Stream Management System”, In Proc. of the 2003 Conf. on Innovative Data Systems Research (CIDR), (online), January 2003, (retrieved on Oct. 12, 2006), Internet URL <http://infolab.usc.edu/csci599/Fall2002/paper/DS1_datastream managementsystem.pdf>. The CQL includes an extension for specifying the sliding window by using parentheses following a stream name in a FROM clause of a structured query language (SQL), which is widely used for the DBMS. As for SQL, there is known one disclosed in C. J. Date, Hugh Darwen: “A Guide to SQL Standard (4th Edition)”, the United States, Addison-Wesley Professional, Nov. 8, 1996, ISBN: 0201964260. There are two types of typical methods for specifying the sliding window: (1) a method of specifying the number of data rows to be cut, and (2) a method of specifying a time interval containing data rows to be cut. For example, “Rows 50 Preceding” described in a second paragraph of R. Motwani, J. Widom, A. Arasu, B. Babcock, S. Babu, M. Datar, G. Manku, C. Olston, J. Rosenstein, and R. Varma: “Query Processing, Resource Management, and Approximation in a Data Stream Management System”, In Proc. of the 2003 Conf. on Innovative Data Systems Research (CIDR), (online), January 2003, (retrieved on Oct. 12, 2006), Internet URL <http://infolab.usc.edu/csci599/Fall2002/paper/DS1_datastream managementsystem.pdf> is a preferred example of the item (1), in which data corresponding to 50 rows is cut to be processed, and “Range 15 Minutes Preceding” is a preferred example of the item (2), in which data for 15 minutes is cut to be processed. In the case of the item (1), the data lifetime is defined to be until 50 pieces of data arrive. In the case of the item (2), the data lifetime is defined to be 15 minutes. The stream data cut by the sliding window is retained on a memory, and is used for the query processing.
However, in the stream data, data sometimes arrives with a delay depending on a state of a network, a device, or the like. For example, a sensor node does not transmit data if the network is disconnected, and transmits the data collectively when a connection is again established with a base station.
As an example of known methods, D. J. Abadi, Y. Ahmad, M. Balazinska, U. Cetinternel, M. Cherniack, J. H. Hwang, W. Lindner, A. S. Makey, A. Rasin, E. Ryvkina, N. Tatbul, Y. Xing, and S. Zdonik, “The design of the Borealis stream processing engine”, In Proc. of CIDR 2005, pp. 277-289 discloses a method of modifying/canceling data by retaining a history of input stream data for a predetermined period and executing the delay tuple again when a delay tuple arrives. During the predetermined period, which is set to a period longer than a lifetime defined by a window, all of the input stream data are stored. When the delay tuple arrives, the stored data and the delay tuple are executed again to thereby obtain a correct processing result. However, retaining all of the stream data for a predetermined period may cause an increase in memory size and time for performing execution again by using input data.
In addition, US 2006/0282695 discloses a method of implementing a transaction processing for recording latencies to solve a problem caused by a phenomenon in which an arrival order changes due to a communication delay. However, without a delay tuple, it is impossible to output a processing result in real time or recalculate a correct processing result.
The application of the stream data processing system is expected in fields in which the real time processing is required, and is typified by financial applications, traffic information systems, distribution systems, traceability systems, sensor monitoring systems, and computer system management.