1. Field of the Invention
The present invention relates to the technology of analyzing time series data of free correspondence.
2. Description of the Related Art
With recent remarkable developments in networking represented by the Internet, high storage density, and higher-performance and lower-price computers, an enormously large amount of information can be accumulated. For example, in a POS (point of sale) system in the distribution industry, the sales contents of all shops in a nation can be centrally managed in the computer of the head office, etc., and the information is accumulated every moment as the correspondence between a sold article and the time it was sold. Other data such as the conditions of various production devices and yield data of generated products in the manufacturing industry, information about the use state of personal credit cards in the financial industry, information about the personal data of insurance subscribers and information about use state in the insurance industry, medical information, and network log data, etc. are also accumulated. Especially recently, data can be easily and automatically accumulated, and there are various fields in which a huge amount of time series data is accumulated. In this situation, there is an increasing demand for using the accumulated time series data and making the most of the data in business.
Conventionally, in analyzing time series data, the time series data is expressed as a time series case configured by a plurality of data points each having a piece of time data and a plurality of attribute values (category values or numerics), and a comparison is made between time series cases. In this case, a time series case is processed in the following two methods. However, for simple explanation, it is assumed that the time of the leading data point of each time series case has already been adjusted to 0 to make an easier comparison between time series cases.                Fixed Correspondence: The time interval between data points is predetermined and common to all time series cases, and no shift in the time direction (correspondence between the data points) is not allowed. Since it is apparent that the data point at the same time point of each time series case corresponds to each other, the correspondence between the data points is clear and the data analysis can be easily performed. However, there are strict restrictions on the data as described above, and an applicable area is small.        Free Correspondence: The time interval between data points is variable. The shift in the time direction (correspondence between data points at different time points) is permitted. Since the correspondence between the data points of the respective time series cases is not defined, it is difficult to perform appropriate analysis, but there are small restrictions on the data, and an applicable area is large. An example of a time series case of free correspondence can be a medical check and medication history, an article purchase history, a credit card use history, an ATM use history, a network log, etc. In a case of a time series case where logs are acquired at predetermined time intervals such as a sensor log of a manufacturing device in the manufacturing industry, the shift in the time direction can be allowed by processing the case as a time series case of free correspondence, thereby possibly realizing a more appropriate data analysis.        
In the explanation below, a time series case of free correspondence having a larger application range (hereinafter a “time series case of free correspondence” is also referred to simply as a “case”) is described as a target of a data analysis.
A data analysis of a case includes the steps of rule extraction of extracting and presenting a characteristic from a case, estimating and determining a label of a non-label case from a set of past labeled cases, clustering similar cases as a group, etc.
The calculation of a distance between cases is described below as one of the data analyses. The calculation of a distance between cases is one of the most fundamental data analyses, and can output quantitative information about how two given cases are similar to each other. By calculating the distance between a non-label case and all past cases, the above-mentioned determination can be made by returning the label of the past case closest in distance to the non-label case. Furthermore, by processing the cases having the shortest distance between cases as “similar” cases, the clustering process can be performed.
For calculating the distance between time series cases of free correspondence, the method for optimizing the correspondence between data points (hereinafter referred to as a “correspondence optimization method”) represented by the DTW (dynamic time warping) method as described in, for example, the non-patent document 1 (by Kazuki Nakamoto and two partners, “Fast Clustering for Time-series Data with Average-time-sequence-vector Generation Based on Dynamic Time Warping” in the Transactions of the Japanese Society for Artificial Intelligence, Vol. 18 (2003), No. 3 Technical Papers, page 146-147) has been used. In this method, for example, a correspondence set having the shortest distance between cases (sum of the distance between data points of all correspondences) is obtained after allowing the retention of a plurality of correspondences by one data point (one-to-many correspondence) as shown in FIG. 1. In FIG. 1, the dotted line connecting the data point of a case A to the data point of a case B indicates the optimized correspondence between data points.
For an example of the conventional correspondence optimization method, the DTW method is explained below in detail using an example of a case of calculating the distance between the following cases A and B.
Problem: Assume that; the case A includes n(A) data points, and the case B includes n(B) data points; the i-th data point of the case A is x(i); the j-th data point of the case B is y(j); and there are k attributes defined. In addition, the values of the k-th attributes of x(i) and y(j) are respectively a(i,k) and b(j,k).
At this time, the optimum correspondence set between data points between the cases A and B and the distance are to be obtained in the DTW method. They can be obtained as follows.
1. First, the single distance between attributes d(i,j,k) between a(i,k)·b(j,k) is calculated by the following equation (1) where σ(k) is a parameter for normalizing the k-th attribute. For example, difference between the maximum value and minimum value of the attribute are used.
                              distance          ⁢                                                            ⁢                                                          ⁢          between          ⁢                                                            ⁢                                                          ⁢          attributes          ⁢                                          ⁢                      d            ⁡                          (                              i                ,                j                ,                k                            )                                      =                  {                                                                                                                                                            a                        ⁡                                                  (                                                      i                            ,                            k                                                    )                                                                    -                                              b                        ⁡                                                  (                                                      j                            ,                            k                                                    )                                                                                                                                              σ                    ⁡                                          (                      k                      )                                                                                                  ⋯                                                              (                                      number                    ⁢                                                                                                              ⁢                                                                                                            ⁢                    attribute                                    )                                                                                                      {                                                                                    0                                                                                              (                                                                                    a                              ⁡                                                              (                                                                  i                                  ,                                  k                                                                )                                                                                      =                                                          b                              ⁡                                                              (                                                                  j                                  ,                                  k                                                                )                                                                                                              )                                                                                                                                    1                                                                                              (                                                                                    a                              ⁡                                                              (                                                                  i                                  ,                                  k                                                                )                                                                                      ≠                                                          b                              ⁡                                                              (                                                                  j                                  ,                                  k                                                                )                                                                                                              )                                                                                                                                                ⋯                                                              (                                      category                    ⁢                                                                                                              ⁢                                                                                                            ⁢                    value                    ⁢                                                                                                              ⁢                                                                                                            ⁢                    attribute                                    )                                                                                        (        1        )            
2. Next, the distances between attributes are added up as shown in the following equation (2), and the distance between data points is obtained.
                              distance          ⁢                                                            ⁢                                                          ⁢          between          ⁢                                                            ⁢                                                          ⁢          data          ⁢                                                            ⁢                                                          ⁢          points          ⁢                                          ⁢                      D            ⁡                          (                              i                ,                j                            )                                      =                              ∑                          k              =              1                        K                    ⁢                      d            ⁡                          (                              i                ,                j                ,                k                            )                                                          (        2        )            
3. Then, the optimum correspondence set between data points is obtained.
In the DTW method, the correspondence set between data points has to satisfy the following two restriction conditions.                Each of all data points have at least one correspondence.        Correspondences do not cross.        
The two restriction conditions are expressed by the following equation (3). However, it is considered that the m-th correspondence connects x(c(m)) to y(d(m)), and the correspondence set is formed by M correspondences.c(1)=1 and c(M)=n(A), andd(1)=1 and d(M)=n(B), andc(m)−c(m−1)=0 or 1 andd(m)−d(m−1)=0 or 1 andc(m)−c(m−1)+d(m)−d(m−1)=1 or 2  (3)
where the distance D between cases is expressed by a sum of the distance between data points of all correspondences as expressed by the following equation (4). However, w(m) is a weight for maintaining the constant frequency of addition of D(c(m), d(m)) although M changes, and the equation (5) is frequently used.
                    D        =                              ∑                          m              =              1                        M                    ⁢                                    w              ⁡                              (                m                )                                      ⁢                          D              ⁡                              (                                                      c                    ⁡                                          (                      m                      )                                                        ,                                      d                    ⁡                                          (                      m                      )                                                                      )                                                                        (        4        )                                          w          ⁡                      (            m            )                          =                  {                                                    1                                                              (                                                            m                      >                                                                        1                          ⁢                                                                                                          ⁢                          and                          ⁢                                                                                                          ⁢                                                      c                            ⁡                                                          (                              m                              )                                                                                                      -                                                  c                          ⁡                                                      (                                                          m                              -                              1                                                        )                                                                          +                                                  d                          ⁡                                                      (                            m                            )                                                                          -                                                  d                          ⁡                                                      (                                                          m                              -                              1                                                        )                                                                                                                =                    1                                    )                                                                                    2                                                              (                                      m                    =                                                                                            1                          ⁢                                                                                                          ⁢                          or                          ⁢                                                                                                          ⁢                                                      c                            ⁡                                                          (                              m                              )                                                                                                      -                                                  c                          ⁡                                                      (                                                          m                              -                              1                                                        )                                                                          +                                                  d                          ⁡                                                      (                            m                            )                                                                          -                                                  d                          ⁡                                                      (                                                          m                              -                              1                                                        )                                                                                              =                      2                                                        )                                                                                        (        5        )            
The optimum correspondence set brings about the minimum value of D in the equation (4) above, and the obtained D refers to the distance between cases. It is known that the optimum solution can be quickly obtained in the Dynamic Programming.
However, in the conventional correspondence optimization method, there has been a problem with some time series data to be processed because all attributes are equally processed. This problem is explained below with reference to time series data including a check result history and a medication history of a patient.
Practically, the true state of a patient changes every moment, and a check result is obtained as data after checking and observing a status at a certain time point. On the other hand, medication has nothing directly to do with the status of a patient at the time point, but rather affects the later status of the patient. However, in the conventional correspondence optimization method, they are equally processed.
In the one-to-many correspondence used in the conventional correspondence optimization method, the one-to-many correspondence is allowed. Therefore, the same process is performed equally on the case where the same values occur once and the case where the same values occur two or more times. However, the practical data contains in a mixed manner the attribute (observation attribute) indicative of the observation of a state such as a check result, etc. of time series data of a patient, and the attribute (operation attribute) working on a state such as a medication history, etc. Relating to the observation attribute, the allowance of a one-to-many correspondence is correct. However, relating to the operation attribute, a state has an influence each time it occurs. Therefore, the influence on a state depends on the number of occurrences, that is, between once and two or more times. Therefore, different processes are to be performed on them. However, the conventional correspondence optimization method has no concept of an operation attribute, and the entire attributes are processes as observation attributes, thereby failing in appropriately calculating a distance between cases.
As described above, there has been the problem the conventional method cannot perform an appropriate data analysis by equally processing an observation attribute and an operation attribute when time series data containing the observation attribute and the operation attribute in a mixed manner.