1. Field of the Invention
The present invention relates to techniques of a stream data processing method, a stream data processing program and a stream data processing apparatus.
2. Description of the Background Art
In the background art, a database management system (hereinafter referred to as “DBMS”) is positioned at the heart of data management of a corporate information system. In the DBMS, data to be processed are stored in a storage, and highly reliable processing as typified by transaction processing is carried out on the stored data. On the other hand, there has been a growing request for a data processing system for real-time processing of a large volume of data arriving from moment to moment. When, for example, financial applications for aiding stock trading are taken into consideration, how quickly the system can react to stock price fluctuation is one of the most important issues for the system.
In a system like the background-art DBMS, stock data are once stored in a storage and retrieval is performed on the stored data. Such a system may lose business chances because processing for storing data and retrieving data cannot catch up with the speed of stock price fluctuation. Any approach to creation of individual real-time applications using a programming language as typified by Java (registered trademark) has problems such as lengthening of the development period, considerable rise of the development cost, and difficulty in quickly dealing with variations of business using the applications. Thus, a general-purpose real-time data processing mechanism has been requested. Stream data processing systems disclosed in JP-A-2003-298661, JP-A-2006-338432 etc. have been proposed as data processing systems suitable for such real-time data processing.
R. Motwani, J. Widom, A. Arasu, B. Babcock, S. Babu, M. Datar, G. Manku, C. Olston, J. Rosenstein, and R. Varma in “Query Processing, Resource Management, and Approximation in a Data Stream Management System”, section 2 (Query Language), In Proc. of the 2003 Conf. on Innovative Data Systems Research (CIDR), January 2003, have disclosed a stream data processing system STREAM. In the stream data processing system, unlike the background-art DBMS, queries are first registered in the system, and the queries are executed continuously as data arrives. Here, stream data does not mean a logically continuous large steam of data such as a video stream, but means a large volume of time-series data consisting of comparatively small and logically independent streams, such as stock price distribution data in a financial application, POS data in retail business, probe-car data in a traffic information system, error log data in computer system management, sensing data generated from a ubiquitous device such as a sensor or an RFID.
Stream data keeps on arriving at the system. The stream data cannot be processed in real time if processing is started after the termination of arrival of the data. In addition, the data arriving at the system must be processed in order of their arrival without being affected by loads of data processing. In the system STREAM, a concept called a sliding window (hereinafter referred to as “window”) is introduced in order to cut stream data continuously arriving at the system into parts designated as a time width such as latest 10 minutes or a width of the number of pieces of data such as latest 1,000 pieces, and process each of the cut parts in real time.
The document “Query Processing, Resource Management, and Approximation in a Data Stream Management System” has disclosed a CQL (Continuous Query Language) as a preferred example of a description language of queries including window specifications. In the CQL, parentheses are used following a stream name in a FROM expression of an SQL (Structured Query Language) widely used in the DBMS so as to give an extension to specify a window.
Not staticized data such as a table or the like handled in the background-art DBMS but seamless stream data cannot be processed if no window is specified for specifying which part of the stream data to be processed. A part of the stream data cut out by the window is held on a memory and used for query processing.
As typical window specifying methods, there are a Range window for specifying the width of a window by time, and a Row window for specifying the width of a window by the number of pieces of data. For example, when [Range 10 minutes] is set using the Range window, latest 10 minutes of data are set as a target of query processing. When [Rows 10] is set using the Row window, latest 10 pieces of data are set as a target of query processing.
Stream data to be dealt with in the stream data processing system includes a group of data arriving seamlessly from moment to moment. If the speed with which the stream data processing system carries out query processing upon one piece of data is lower than the speed of data arriving per unit time, the stream data processing system cannot process the arriving data completely. Query processing with a high load becomes a bottleneck, causing stagnation of data around the query processing. Once such data stagnation occurs even at one place, the throughput of the system as a whole will deteriorate.