1. Field of the Invention
The present invention relates to a method of configuring a stream data processing system capable of seamlessly accessing real time stream data and archive data, a method of seamlessly accessing real time data and archive data in the stream data processing system, and a stream data processing method using a plurality of stream data processing systems cooperatively.
2. Description of the Related Art
In a database management system (hereinafter described as DBMS) for processing data stored in a storage, there is a high demand for a data processing system capable of processing data arriving from time to time in real time. For example, in a stock marketing system, how quickly responds to a change in stock prices is one of most important issues. An approach to storing stock data once in a storage and retrieving the stored data as in a conventional DBMS cannot follow the speed of stock price change and may lose a business chance. U.S. Pat. No. 5,495,600 discloses a mechanism of periodically executing stored queries. It is difficult, however, to apply this mechanism to real time data processing an important point of which is to execute a query immediately upon arrival of data such as stock prices.
As a data processing system suitable for such real time data processing, a stream data processing system has been proposed. For example, a stream data processing system STREAM is disclosed in “Query Processing, Resource Management, and Approximation in a Data Stream Management System” written by R. Motwani, J. Widom, A. Arasu, B. Babcock, S. Babu, M. Datar, G. Manku, C. Olston, J. Rosenstein, and R. Varma, in Proc. of the 2003 Conf. on Innovative Data Systems Research (CIDR), January 2003.
As different from a conventional DBMS, the stream data processing system registers a query and continuously executes the query upon arrival of data. In order to efficiently process stream data, STREAM introduces the concept called a sliding window for partially cutting out stream data. A typical example of a descriptive language for a query including designation of the sliding window is Continuous Query Language (CQL) disclosed in the above-described document written by R. Motwani et al. CQL has an extension of designating the sliding window by using square brackets following a stream name in a FROM phrase of Structured Query Language (SQL) widely used by DBMS. The details of SQL are written in “A Guide to SQL Standard (4th Edition)” written by C. J. Date, Hugh Darwen, Addison-Wesley Professional, 4 Edition, ISBN 0201964260. FIG. 14 shows an example of a query of CQL shown in Chapter 2.1 of the above-described document written by R. Motwani, et al. This query calculates the total number of accesses per past day from a domain stanford.edu. “Request” is not a table used by a conventional DBMS but an endless stream. Therefore, the total number cannot be calculated without a sliding window designation [Range 1 Day Preceding]. Stream data which remains in the sliding window is stored in a memory and used for query processing.
The stream data processing system is expected to be adopted to applications requiring real time processing, typically financial applications, traffic information systems, and computer system management. However, a conventional stream data processing system is difficult to be adopted to the real business because it has the following problems: (1) stream data is lost if a system load increases or if the query processing of the stream data processing system is delayed or stopped by system faults; (2) stream data processing cannot be executed if the available memory is insufficient for the size of sliding window designated by the query; (3) a retrieval range cannot be expanded to an earlier time than when the query is registered; and (4) if stream data having a capacity over the ability of the stream data processing system arrives or requires to be processed, the performance necessary for businesses cannot be retained.
In order to provide a stream data processing system capable of being adopted to real businesses, there are the following issues: (1) protecting stream data even if a system load increases or the query processing of the stream data processing system is delayed or stopped by system faults; (2) executing stream data processing even if the available memory is insufficient for the size of a window designated by the query; (3) expanding a retrieval range to an earlier time than when the query is registered; and (4) providing a mechanism for retaining the performance necessary for the real business even if stream data having a capacity over the ability of the stream data processing system arrives or requires to be processed.