1. Technical Field
The present invention relates to a stream data processing method for processing stream data, which arrives momentarily, in real time, and more particularly to a stream data processing method for defining an upper limit of memory usage of query with use of a time-based window and a system therefor.
2. Description of the Related Art
Up to now, a database management system (hereinafter referred to as “DBMS”) dominates data management of a business information system. The DBMS stores data to be processed in a storage, and realizes highly reliable processing typified by transaction processing on the stored data. On the contrary, there increase demands on a data processing system for processing great quantities of data, which arrives momentarily, in real time. For example, from the point of view of financial application that supports stock trading, the system is faced with one of the most important issues that how the system can respond to fluctuations of stock prices quickly. In the system that retrieves, after stock data has been stored in a storage device once, the stored data, as in the conventional DBMS, storage of the data and retrieval processing subsequent to the data storage cannot catch up with the speed of the fluctuations of stock prices, which may lead to a failure to take full advantages of business opportunities.
As a data processing system suitable for the real-time data processing described above, there has been proposed a stream data processing system. For example, the stream data processing system “STREAM” has been disclosed in “Query Processing, resource Management, and Approximation in a data Stream Management System”, In Proc. Of the 2003 Conf. on Innovative Data Systems Research (CIDR), January 2003, written by R. Motwani, J. Widom, A. Arasu, B. Babcock, S. Babu, M. Datar, G. Manku, C. Olston, J. Rosenstein, and R. Varma (hereinafter referred to as “Reference 1”).
In the stream data processing system, unlike the conventional DBMS, queries (inquiries) are first registered in the system, and the queries are continuously executed together with arrival of data. In the system, stream data is not one large data that is logically continuous such as a video stream, but great quantities of time-series data which is relatively small and logically independent of each other, such as stock price distribution data in the financial application, point-of-sale (hereinafter referred to as “POS”) data in a retail trade, probe car data in a traffic information system, error log in a computer system management, sensing data generated from a ubiquitous device such as a sensor or an RFID (radio frequency identification).
Because the stream data continues to arrive at the system, processing in real time cannot be performed when processing starts after waiting for arrival of an end of the stream data. Also, there is a need to process data that has arrived at the system in order of arrival without being affected by a load of data processing. In the above-mentioned STREAM, because real-time processing on the stream data continuously arriving at the system is realized while cutting off part of the stream data with a specified time duration such as latest 10 minutes, or a specified width of number such as latest 1000 data, a concept called sliding window (hereinafter referred to simply as “window”) is introduced. As a preferable example of description languages of query including the window designation, there is a CQL (continuous query language) disclosed in “The CQL continuous query language: semantic foundations and query execution”, The VLDB Journal, Volume 15, Issue 2, pp. 121 to 142, June 2006, written by A. Arasu, S. Babu and J. Widom (hereinafter referred to as “reference 4”). The CQL employs brackets subsequent to a stream name for a “FROM” clause of an SQL (structured query language) widely used in the DBMS, thereby being subjected to extension for designating a window. The details of SQL are disclosed in “A Guide to SQL Standard (4th Edition)”, Addition-Wesley Professional; 4th Edition (Nov. 8, 1996), ISBN: 0201964260, written by C. J. Date, and Hugh Dawen (hereinafter referred to as “reference 2”).
A query 1101 of FIG. 14 is an example of query using the CQL disclosed in Section 2.1 of reference 1. In the query, the total number of accesses from a domain “stanford.edu” for the past one day from the present is calculated in a certain web proxy server. “Requests” is web access data continuously arriving at the web proxy server, and not static data such as a table dealt with by a conventional DBMS, but seamless stream data. For that reason, the total number of accesses cannot be calculated without designating which part of stream data is to be processed by the aid of window designation “[Range 1 Day Preceding]”. Stream data cut off by the window is retained on a memory, and used in query processing.
The representative window designating methods include a Range window (hereinafter referred to as “time-based window”) designating the width of window by a time, and a Row window (hereinafter referred to as “row-based window”) designating the width of window by the number of data. For example, when [Range 10 minutes] is set with the use of the time-based window, stream data for the latest 10 minutes is to be query processed, and when [Rows 10] is set with the use of the row-based window, stream data for the latest 10 rows is to be query processed.
Under the circumstances, Motwani, et al., “Caching Queues in Memory Buffers”, In Proc. Of SODA 2004 (hereinafter referred to as “reference 3”) discloses a method for storing stream data not stored in a memory in a magnetic disc.
On the other hand, US Patent Publication US 2007/0226239 discloses a method of discarding part of data by sampling when the volume of data to be processed increases, to thereby delete the memory usage.