The present invention relates generally to computer systems and, more particularly, to stream data processing method on recursive queries of graph data.
Stream data processing is in wide use. There has been an increasing demand for a data processing system which carries out real-time processing for data continuously arriving at a database management system (hereafter, referred to as “DBMS”), which carries out processes for data stored in the storage system. For example, in a system for trading stocks, how fast the system can react to changes in stock prices is one of the most important objects, and a method such as the one carried out by a conventional DBMS, in which stock data is once stored in a storage system and then the stored data is searched for, cannot immediately respond in correspondence with the speed of the changes in stock prices, and may result in losing business chances. For example, though U.S. Pat. No. 5,495,600 discloses a mechanism which issues stored queries periodically, it is difficult to apply this mechanism to the real time data processing for which it is important to execute a query immediately after data such as stock prices is input.
Data which continuously arrives is defined as stream data, and there has been proposed a stream data processing system as a data processing system suitable for the real-time processing for the stream data. For example, R. Motwani, J. Widom, A. Arasu, B. Babcock, S. Babu, M. Datar, G. Manku, C. Olston, J. Rosenstein, and R. Varma: “Query Processing, Resource Management, and Approximation in a Data Stream Management System,” Proc. of the 2003 Conf. on Innovative Data Systems Research (CIDR), (online), January 2003, (retrieved on Jan. 19, 2012), Internet URL <https://database.cs.wisc.edu/cidr/cidr2003/program/p22.pdf> discloses a stream data processing system “STREAM.”
In the stream data processing system, first, queries are registered to the system, and the queries are executed continuously each time data arrives, which is different from the conventional DBMS. The above-mentioned STREAM employs an idea referred to as sliding window, which partially cuts stream data for efficiently processing the stream data to thereby impart lifetime to the data. As a preferred example of a query description language including a sliding window specification, there is a continuous query language (CQL) disclosed in the R. Motwani et al. reference cited above. The COL includes an extension for specifying the sliding window by using parentheses following a stream name in a FROM clause of a structured query language (SQL), which is widely used for the DBMS. There are two types of typical methods for specifying the sliding window: (1) a method of specifying the number of data rows to be cut, and (2) a method of specifying a time interval containing data rows to be cut. For example, “Rows 50 Preceding” described in the second paragraph of the R. Motwani et al. reference is a preferred example of item (1), in which data corresponding to 50 rows is cut to be processed, and “Range 15 Minutes Preceding” is a preferred example of item (2), in which data for 15 minutes is cut to be processed. In the case of item (1), the data lifetime is defined to be until 50 pieces of data arrive. In the case of item (2), the data lifetime is defined to be 15 minutes. The stream data cut by the sliding window is retained on a memory, and is used for the query processing.
One problem is that graph path search uses a recursive approach. Stream data processing engine has a temporal store which has tentative status of aggregation. On the other hand, when stream data processing engine uses a recursive approach, the processing cost to maintain the tentative status is very high and the required memory space is huge. FIG. 1 shows a sample graph. For example, the path “A→P” is calculated as “A→D→L→M→O→P.” Stream data processing engine holds certain period/amount of the graph nodes (10 minutes, 50 nodes, etc.) and graph edges. Graph changes by not only insertion/deletion of nodes/edges but also cutting out from sliding windows. For example, when graph node 103 is newly added, stream data engine can calculate from the temporal store data “A→D→L→M→O.” It is the same situation in the case of adding new edge 104, deleting node 105, and deleting edge 106. On the other hand, the engine has to maintain all the temporal stores every time the graph changes.
Miyuru Dayarathna and Toyotaro Suzumura, “Hirundo: A Mechanism for Automated Production of Optimized Data Stream Graphs,” ICPE 2012, ACM/SPEC 3rd International Conference on Performance Engineering, 2012/4, Boston, US, to appear, represents a data stream processing method for graph data. However, this method processes graph data by embedded function in the query. US2010/0106946 provides a recursive query method. However, this method does not provide to set recursive number. Some traditional RDBMS (Relational DataBase Management System) have a function to set the number or recursive number. However, such traditional RDBMS only handles a set of data not data streams and does not have a temporal store. As a result, it is difficult to provide a recursive query method for which the maintenance cost is low.