Conventional dimensionality reduction techniques on time series data include Piesewise Aggregate Approximation (PAA) that is described in “Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases” by E. Keogh, K. Chakrabarti, M. Pazzani, and Mehrotra in Journal of Knowledge and Information Systems, 2000, for example.
With PAA, time series data is divided into segments, and the mean value of a segment is used as a representative value of the individual segment for time series data compression.
Mean value calculation is simpler than Fourier Transform or Singular Value Decomposition, and can generate dimensional compression time series data at higher speed.
Another conventional technique of dimensional reduction on time series data is a method using singular value decomposition that is described in “Efficiently Supporting Ad Hoc Queries in Large Datasets of Time Sequences” by F. Korn, H. V. Jagadish, and C. Faloutsos in Proceedings of SIGMOD '97, pp 289-300, for example. The method using singular value decomposition does not employ all elements processed by singular value decomposition. Only leading singular values (large singular values) are used for time series data compression.
Dimensional compression by singular value decomposition has the advantage of high search efficiency with better extraction of the shape of data than by any other method.
With dimensionality reduction on image data, a “transform coding system” is disclosed in JP61-285870 as a conventional technology, for example. Image data is divided into blocks and compressed on a block basis. Divided blocks are compressed by using a combination of Discrete Cosine Transform (DCT) and a transform representing a horizontal and vertical angle of gradient of a matrix.
The thus combining two transforms can achieve a higher compression rate for the block-based extraction of the features of blocks and the selection of the optimal transform.
The PAA can achieve a faster dimensional compression by using the mean value of each segment as the representative value of the segment. However, PAA has the following problem when searching for time series data or in similarity search. In the search procedure for time series data, solution candidates are found first in a compression space and then a final solution is searched for among the solution candidates in a real space. Therefore if a large number of solution candidates found in the compression space are not real solutions in the real space, then the search becomes inefficient. The problem of inefficient search of PAA is resulted from insufficient information after compression that is caused by the deformation of a time series by the use of a mean value as the representative value of each segment. With a flat time series, a time series with upward sloping, and a time series with downward sloping, when their mean values are the same, then their values after compression become the same.
The SVD, which extracts the form of data efficiently, is search efficient in the sense of the search efficiency mentioned above. The problem is, however, that singular value decomposition takes a considerable amount of time dealing with a large volume of data, and cannot handle that much data within a realistic time frame.
The “transform coding system” of JP61-285870, which is directed to improve the compression rate, has the following problem when used in search for time series data. The first thing that needs to be done in search for time series data is to compress all segments (blocks) at the same compression rate in order to search for solution candidates in a compression space. With the above-mentioned system, however, the compression rates are different among different blocks.