1. Technical Field
The present invention relates generally to a method for parallel mining of temporal relations in a large event file and, more particularly, to temporal data mining technology for mining useful temporal relations in parallel from a large event file having a temporal interval using a MapReduce model. The temporal relations in the present invention are represented by temporal interval algebra published by Allen, and include temporal relations, such as before, equal, meets, overlaps, during, starts, and finishes, wherein, for example, ‘X during Y’ means that X is generated during the period of event Y.
2. Description of the Related Art
Temporal data mining is technology for mining useful patterns from temporal data (event data), and includes sequential pattern mining, similar time series analysis, etc.
Sequential pattern mining refers to a method of mining patterns, in which specific item sets occur sequentially between transactions composed of item sets, and is configured to search for a pattern corresponding to a case where 50% of customers who lent product A subsequently lend products B and C.
Similar time series analysis refers to a method of searching time-series data, such as stock market behavior, for a pattern similar to that of a specific stock.
However, conventional methods are problematic in that, first, temporal data having only a time point of occurrence is handled, and it is difficult to mine useful patterns from data having a time interval.
For example, it is difficult to search for a useful temporal relation, such as a case where 50% of customers who lent product A lend product B during a lending period of product A, and then lend product C immediately after the termination of a lending period of product B. Second, temporal data is big data having an enormous capacity, and then immense expenses (data storage space and processing time) are required so as to store and process the big data and search for useful patterns. Consequently, the conventional methods are not suitable for application to big data.
For example, pieces of big data, such as web logs collected from websites a large number of users are visiting, life logs generated by collecting, in real time, personal activities and status information through sensors contained in smart phones, and health records generated by recording personal lifelong health conditions, are not suitable for analysis by existing methods.
U.S. Pat. No. 6,826,569 discloses technology for extracting patterns for sequential events that repeatedly occur in pieces of sequential data. However, the technology disclosed in this U.S. patent is limited in that various temporal relation rules cannot be mined in parallel from time interval data.