1. Field of the Invention
The present invention relates to a method of compressing and reconstructing a large amount of data, and more particularly, to a method of compressing and reconstructing data using statistical analysis.
2. Description of the Related Art
When transmitting or storing a large amount of data, data is generally compressed according to a predetermined rule to reduce transmission time or storage space, and the compressed data is decompressed to reconstruct the original data. Compression and reconstruction is performed such that the reconstructed data is entirely identical to the original data. Accordingly, even though compression efficiency varies with the compression/decompression algorithm or data format, there is a limitation in improving the compression efficiency.
However, in some cases it is not necessary for the original data to be reconstructed identically; only the tendencies between the original data and the reconstructed data are needed. For example, for electrical test data of a semiconductor device which is measured in every process during manufacture of the device, the number of entries is relatively small (e.g., several entries to several hundred entries). However, sometimes it is necessary to measure several thousand to several tens of thousands of lines of data to detect tendencies or the distribution of the characteristics of devices, which have slight variations even if identical devices are manufactured under the same conditions. In this case, the overall tendency or distribution is more important than a specific value of data at a certain line in a certain entry. Moreover, for such a large amount of data, it is difficult to understand the correlation between entries and the distribution of data in an initial data format. In addition, it is not easy to store, accumulate and manage such a large amount of data for an extensive period of time.
Typically, the average value and standard deviation are measured to understand the tendency and distribution of a large amount of data, or information, such as correlation, is obtained using statistical analysis. However, a need exists for a method of compressing a large amount of data and reconstructing original data from the compressed data.
An object of the present invention is to provide a method of compressing a large amount of data and reconstructing the compressed data in an original data format using statistical analysis.
In an aspect of the present invention, a method is provided for compressing and reconstructing original data having n rowsxc3x97m entries, which are correlated to one another, using statistical analysis, wherein the n rows are correlated to the m entries and m less than n. The method includes the steps of (a) obtaining a correlation matrix C having m rowsxc3x97m columns, wherein the correlation matrix C includes correlation coefficients between the m entries, (b) obtaining eigenvectors and eigenvalues of the correlation coefficients in the correlation matrix C, (c) obtaining a factor loading matrix F having m rowsxc3x97p columns from the eigenvectors and the eigenvalues using a multivariate analysis, wherein p is a natural number less than or equal to the m entries, (d) generating random numbers to form a random-number matrix having l rowsxc3x97p columns, wherein l is the number of rows to be reconstructed with respect to the m entries, (e) obtaining an intermediate data matrix having l rowsxc3x97m columns including correlation information by multiplying the random-number matrix by a transposed matrix of the factor loading matrix F, and (f) scaling the intermediate data matrix according to a scale of the original data to obtain a reconstructed data matrix comprised of elements in l rowsxc3x97m columns, whereby the reconstructed data matrix has a format of the original data.
In another aspect of the present invention, the factor loading matrix F is stored as compressed data, and reconstruction is performed by steps (d) through (f). That is, steps (a) through (c) are for data compression and steps (d) through (f) are for data reconstruction.
In yet another aspect of the present invention, the correlation matrix C and the factor loading matrix F are stored as compressed data, and reconstruction is performed by steps (d) through (f).
In yet another aspect, the correlation matrix C obtained in step (a) is stored as compressed data, and data having the original data format can be reconstructed by steps (b) through (f). That is, the step (a) is for data compression, and the steps (b) through (f) are for data reconstruction.
According to an aspect of the present invention, a large amount of original data having n rows and m columns (n greater than  greater than m) can be compressed to a correlation matrix having m rows and m columns or to a factor loading matrix having m rows and p columns, and data having the tendency and format of the original data can be reconstructed using a random-number matrix and the factor loading matrix, in a simple way.
These and other aspects, features and advantages of the present invention will be described or become apparent from the following detailed description of the preferred embodiments, which is to be read in connection with the accompanying drawings.