This invention relates to a data processing apparatus and a data processing method for processing data.
In time-series data monitoring processing, for time-series data such as a sensor and a log, rules are defined in a program in advance, and processing such as filtering, summing, abnormality detection, and future estimation for the time-series data is carried out in accordance with the rules. Examples of the time-series data monitoring processing include the monitoring of plants in a factory and the monitoring of servers. The monitoring processing for the factory plant acquires values of sensors for temperature, voltage, and the like mounted to a machine, extracts an abnormal point based on a change in a time series of the value of the sensor over several hours to one day. On the other hand, the monitoring processing for the server acquires usage amounts of a central processing unit (CPU) and a hard disk or a packet amount of a network from a log of the server, and monitors changes in time series thereof over several seconds to several hours, to thereby detect an abnormality. Examples of execution methods of programs for the time-series data monitoring processing include batch processing and stream processing.
A program that carries out the batch processing (hereinafter referred to as “batch program”) collects and inputs time series data accumulated in a file or a database as vector data, and collects and outputs a processing result as vector data. As middleware for supporting the execution of the batch program, there is exemplified a batch processing platform disclosed in Japanese Patent Application Laid-open No. 2011-221799. The batch processing platform is middleware of carrying out, for example, scheduling, start, and stop of the batch program. The batch processing is used for a case such as the monitoring for the factory plant, which has low in a requirement for a response time, but high in requirements for a processing throughput and a low cost.
On the other hand, a program of carrying out the stream processing (hereinafter referred to as “stream program”) sequentially processes stream data delivered every moment, and also sequentially outputs processing results thereof as stream data. As middleware for supporting the execution of the stream program, there is exemplified a stream processing platform disclosed in a technical document (L. Girod, Y. Mei, S. Rost, A. Thiagarajan, H. Balakrishnan, S. Madden, “XStream: a Signal-Oriented Data Stream Management System”, International Conference on Data Engineering (ICDE), April 2008). The stream processing platform is middleware of carrying out, for example, scheduling, start, and stop of the stream program. The stream processing is used for a case such as the monitoring of the server, which has high requirement for the response time but low requirement for the processing throughput and the cost.
As the range of the cases subject to the time-series data monitoring processing expands, development of a program tailored to each case becomes difficult, and existing stream programs and batch programs need to be reused for various cases. However, the requirements such as the response time and the processing throughput differ from case to case such as the monitoring of the factory plant and the server. Thus, in order to reduce the response time, execution of an existing batch program on a stream processing platform is required, and in order to increase the processing throughput, execution of an existing stream program on a batch processing platform is required.
The batch processing platform disclosed in JP 2011-221799 A supports a stream program to operate on a batch platform. For that purpose, the batch processing platform disclosed in JP 2011-221799 A specifies a time range of input data for accumulated data thereon and converts the data in the range into stream data to be executed by the stream program. On the other hand, the stream processing platform disclosed in the technical document, which supports the batch program to operate thereon, executes a batch program which collects a plurality of pieces of stream data into a data block referred to as SigSegs and inputs/outputs the data block for the batch program.
However, when the batch program is executed on the stream processing platform according to the technical document, such a configuration that a plurality of data blocks include the same stream data, that is, an overlap is provided between the pieces of data input to the batch program, is not considered. The stream processing platform according to the technical document fails to execute a batch program which holds a certain number of pieces of time-series data in a window and slides the window for processing.
Moreover, the batch processing platform according to JP 2011-221799 A does not consider an overlap between pieces of stream data. The batch processing platform according to JP 2011-221799 A fails to execute a stream program to slide a window thereon.
In this way, there is a problem in that, for the batch processing and the stream processing, when the processing platform and the program executed on the processing platform are different from each other in the processing content, pieces of the time-series data fail to overlap therebetween.