1. Field of the Invention
The present invention relates to a data processing system, and more particularly to a data processing technology for processing stream data in real time.
2. Description of the Related Art
A database management system (hereinafter referred to as the DBMS) has been mainly used for data management of a corporate information system. The DBMS stores processing target data on a disk drive and processes the stored data. Meanwhile, there is an increasing demand for a data processing system that processes momentarily incoming data (tuples) in real time. For example, in a finance application that provides support for stock trading, one of the crucial tasks of the system is to quickly respond to fluctuations in stock prices. If the system stores stock data on a disk drive and then searches through the stored data like a previously used DBMS, the processes for data storage and subsequent searches may fail to keep up with the speed of stock price fluctuations, thereby causing a user of the system to miss important business opportunities.
A stream data processing system is proposed as a data processing system suitable for the above-described real-time data processing. For example, a stream data processing system named “STREAM” is disclosed in Reference 1.
The stream data processing system differs from the previously used DBMS in that the former preregisters a query for defining a data processing method, stores incoming data in a volatile memory of a server, and performs data processing. Stream data processed by the stream data processing system are time-series data such as momentarily changing stock price data, retail POS data, error logs obtained during computer system management, and sensing data generated from sensors, RFID (radio-frequency identification) tags, and the like.
The STREAM processes stream data, which incessantly arrives at the system, while acquiring part of the stream data, for instance, by picking up the last 10 minutes of data or the last 1000 pieces of data. A concept called a window is adopted in order to acquire part of the stream data. A preferred example of a language for describing queries such as a window definition query is Continuous Query Language (hereinafter referred to as CQL) that is disclosed in “Query Processing, Resource Management, and Approximation in a Data Stream Management System” (CIDR 2003) (hereinafter referred to as Reference 1), which is written by R. Motwani, J. Widom, A. Arasu, B. Babcock, S. Babu, M. Datar, G. Manku, C. Olston, J. Rosenstein, and R. Varma.
If the stream data processing system becomes faulty, data stored in the volatile memory of the server may be lost. Therefore, when the stream data processing system recovers from a fault, it is necessary to recover the data stored in the volatile memory as well.
Various fault recovery methods are proposed for use with the stream data processing system. One of them is to reenter an input stream and resume data processing. When this method is used, the input stream is backed up to provide against a system fault. If the system becomes faulty, it is restored to normal by reentering the input stream. Various fault recovery methods for the stream data processing system are disclosed in “High-Availability Algorithms for Distributed Stream Processing” (ICDE 2005) (hereinafter referred to as Reference 2), which is written by Jeong-Hyon Hwang, Magdalena Balazinska, Alexander Rasin, Ugur Cetinternel, Michael Stonebraker, and Stan Zdonik.
Further, a method of enhancing the reliability of stream data processing by archiving stream data in a nonvolatile memory is disclosed in Japanese patent Application Laid-Open Publication No. 2006-338432