In the present information blooming era, many computer analysis systems used for data processing are developed and used together with corresponding devices for analyzing and computing a desired volume of data effectively, and the numeric method is the core of data processing of these computer analysis systems. However, the increasingly high volume of data will slow down the overall computing speed of the computer analysis systems. For instance, a substantial increase of transmission speed of a wireless communication system greatly increases the volume of transmitted data, and a substantial increase of number of pixels in a charge coupled device greatly increases the video data volume and the increasingly popular network brings a huge volume of users' browsed and recorded data. Therefore, it is necessary to have a numeric method capable of quickly processing such a large volume of data for processing data in an analysis, and a numeric analysis method is used for processing and analyzing these large volumes of data. Among the numeric analysis methods, the traditional singular value decomposition (SVD) is a reliable matrix decomposition generally used for analyzing complicated data, particularly for analyses with many variables. The SVD is a method that decomposes a column space and a row space of a matrix into two orthogonal matrixes and one diagonal matrix. Assumed that X is a m*n real number matrix, and the rank of X is r, and X is decomposed into X=SVDT, where S and D are orthogonal matrixes. In other words, the length of the row vector of S and D is equal to 1, and both are perpendicular to each other. V is a diagonal matrix, and the non-diagonal values of V are zero. Regardless of X being a symmetric matrix or not, XXT must be a symmetric matrix. A traditional way of solving the SVD is to multiply X by itself to obtain XXT, and then find the eigen value and the eigen vector of the XXT matrix. The matrix formed by the computed eigen vectors of the XXT is matrix S, and the corresponding eigen value is equal to the square of the diagonal values of V. Similarly, X is multiplied by itself to obtain XTX and then the eigen vector of the XTX is computed, and the matrix of eigen vectors is the matrix D.
In recent years, the SVD technology is used extensively in the area of processing natural languages, and the most well-known method is the Latent Semantic Indexing (LSI). With the LSI technology, scholars correlate a text with a keyword and project the data of the text and the keyword into a smaller 1-D space, such that the scholars can compare and classify the text with the keyword, the text with another text, and the keyword with another keyword. In the LSI analysis process, a matrix A is used for recording a relation between the text and words. For example, in the correlation between 1,000,000 articles and 50,000 words, each row corresponds to an article, and each column corresponds to a word in the matrix as shown in Equation (1) below:
                    A        =                  (                                                                      a                  11                                                            ⋯                                                              a                                      1                    ⁢                    i                                                                                                a                                      1                    ⁢                    n                                                                                                      ⋯                                                                                                                                                                                                        ⋯                                                                                      a                                      i                    ⁢                                                                                  ⁢                    1                                                                                                                                                                              a                                      1                    ⁢                    j                                                                                                a                  in                                                                                    ⋯                                                                                                                                                                                                        ⋯                                                                                      a                                      m                    ⁢                                                                                  ⁢                    1                                                                              ⋯                                                              a                  mj                                                                              a                  mn                                                              )                                    (        1        )            
In Equation (1), m=1,000,000 and n=500,000. The element (i, j) represents the weighted word frequency of the jth word of a dictionary appeared in the ith article. This matrix is very large, having 1,000,000*50,000=50,000,000,000 elements, and such a large matrix comes with a significant rank of 100, and the LSI key technology is to use the SVD to decompose the large matrix into a product of three small matrixes as shown in FIG. 1. The aforementioned matrix is decomposed into a 1,000,000*100 matrix X, a 100*100 matrix B, and a 100*500,000 matrix Y. The total number of elements of these three matrixes adds up to 150,000,000 which is only 1/3000 of the original matrix. The corresponding storage volume and computation volume can be reduced by three or more orders of magnitude. In FIG. 1, the decomposed first matrix X represents a 100-D LSI space in the 1,000,000 articles, and the third matrix Y represents a LSI space in the 50,000 words, and the diagonal values of the middle matrix represents the significance of each axial direction in the LSI space. If an article and words are projected into the LSI space, and certain words fall in the neighborhood of the article (or in the same direction), then the words can be used as a keyword of the article. We can also compare the distance between the article and another article in the LSI space. If the distance between two articles in the LSI space is near, then the contents of the two articles will be very close. Similarly, we can also compare the distance between the word and another word to find out which vocabulary is a synonym. In other words, the LSI gives a basic application on semantics. However, if the size of n is substantially equal to m, the computing volume of the traditional SVD will be O(n3), and if A is a larger matrix, then the computing time of the computer analysis system will be extended, and thus the practical application of the computer analysis system will be limited.
Principal component analysis (PCA) is also a common method based on the principle of forming data into a certain group of linear combinations of perpendicular bases through a linear conversion and used for analyzing multivariate data. The sequence of perpendicular bases corresponds to the variables corresponding to the raw data expanded in the direction of the base. To cope with the principle of information theory, the larger direction of the variable, the more significance is the information existed in the direction. Therefore, the PCA method naturally provides a data representation method according to the information significance sequence. In many applications, the major direction (or component) has provided sufficient required information, and thus the PCA becomes an important tool for reducing data and eliminating noises in data.
The principles of SVD and PCA are very similar, since the PCA starts from decomposing the variable matrix, and thus PCA can be considered as adjusting the center of mass of a row vector to zero, and then the SVD is performed for the matrix after the tensor product. If the raw data is distributed at the data with a center of mass equal to zero, then the base of the row vector decomposed by the SVD will be equal to the base decomposed by the PCA.
Multidimensional scaling (MDS) which is another important SVD application was proposed in 1952, and MDS is a method of deriving relative coordinates of an object with respect to another object from the relative distance between the objects. The main application of this method is to express a relation between these objects visually and effectively by known objects with a similarity or a difference. Therefore, MDS is usually used in a complicated data analysis, particularly a multivariate analysis. The high dimensional data is mapped into a low dimensional 2D space or 3D space to facilitate the determination by a human vision system. According to the curse of dimension, a larger data volume is required for searching data in a high dimensional space than that in a low dimensional space, and the accuracy will be lower than that of the search in a low dimensional space. Therefore, reducing to a low dimension is a necessary process, and the MDS plays a decisive role in this process. The following description of the MDS technology has show the close relationship among the MDS technology, the PCA and the SVD.
In the foregoing MDS, assumed that X is a p*N matrix; in other words, there are N objects, and each object is described by different variable of the same p types, D=XTX indicates product matrix of X, and I is a N*1 vector, and each element is defined as 1.
                    B        =                                            (                              X                -                                                      1                    N                                    ⁢                                      Xii                    T                                                              )                        T                    ⁢                      (                          X              -                                                1                  N                                ⁢                                  Xii                  T                                                      )                                              (        2        )            
Equation (2) shows a product matrix X shifted to the center point. In other word, the variance matrix X and matrix B are considered as results after double centering the matrix.
                    H        =                  I          -                                    1              N                        ⁢                          ii              T                                                          (        3        )            
B can be simplified to B=HDH. Since matrix B is a symmetric matrix, the SVD decomposes B into B=UVUT.
                              B                =                              X            -                                          1                N                            ⁢                              Xii                T                                              =                      UV                          1              2                                                          (        4        )            
The row vector of √{square root over (B)} is the coordinate of the center of the shifted X. Therefore, the double centering operation of a tensor product D of the matrix X precisely provides a variance matrix of the matrix X. The matrix obtained by taking the square root of the variance matrix is the matrix X with its row vector shifted to the center of mass equal to zero. Therefore, the core technology of the MDS is the double centering operation of the matrix and the square decomposition, wherein the square decomposition process uses the SVD technology to obtain a result that will lose the information of the center of mass of the raw data but maintain the relation of their relative positions. Since the MDS procedure uses the SVD technology, therefore the computing complexity of the MDS is substantially the same as that of the SVD. The traditional multidimensional scaling (MDS) is also limited by the computing volume. If the number of objects increases, the computing volume will be increased by the speed of O(N3). If the number of objects is huge, the traditional MDS can no longer be used for the foregoing purpose.
Therefore, it is a main subject of the invention to provide a numeric method that can overcome the bottleneck of taking so much computing time of the computer analysis system when the a number of objects or a data volume is processed.