1. Field of the Invention
The present invention relates to information representation, information cartography and data mining. The present invention also relates to pattern analysis and representation, and, in particular, representation of object relationships in a multidimensional space.
2. Related Art
Similarity is one of the most ubiquitous concepts in science. It is used to analyze and categorize phenomena, rationalize behavior and function, and design new entities with desired or improved properties. It is employed in virtually all scientific and technical fields, and particularly in data mining and information retrieval. Similarity (or dissimilarity) is typically quantified in the form of a numerical index derived either through direct observation, or through the measurement of a set of characteristic attributes, which are subsequently combined in some form of similarity or distance measure. For large collections of objects, similarities are usually described in the form of a matrix that contains some or all of the pairwise relationships between the objects in the collection. Unfortunately, pairwise similarity matrices do not lend themselves for numerical processing and visual inspection. A common solution to this problem is to embed the objects into a low-dimensional Euclidean space in a way that preserves the original pairwise relationships as faithfully as possible. This approach, known as multidimensional scaling (MDS) (Torgeson, W. S., Psychometrika 17:401–419 (1952); Kruskal, J. B., Phychometrika 29:115–129 (1964)) or nonlinear mapping (NLM) (Sammon, J. W., IEEE Trans. Comp. C18:401–409 (1969)), converts the data points into a set of real-valued vectors that can subsequently be used for a variety of pattern recognition and classification tasks.
Multidimensional scaling originated in the field of mathematical psychology and has two primary applications: 1) reducing the dimensionality of high-dimensional data in a way that preserves the original relationships of the data objects, and 2) producing Cartesian coordinate vectors from data supplied directly in the form of similarities or proximities, so that they can be analyzed with conventional statistical and data mining techniques.
Given a set of k objects, a symmetric matrix, rij, of relationships between these objects, and a set of images on a m-dimensional display plane {yi, i=1, 2, . . . , k; yi∈ m}, the problem is to place yi onto the plane in such a way that their Euclidean distances dij=∥yi−yj∥ approximate as closely as possible the corresponding values rij. The quality of the projection is determined using a sum-of-squares error function such as Kruskal's stress:
                    S        =                                                            ∑                                  i                  <                  j                                            ⁢                                                (                                                            d                      ij                                        -                                          r                      ij                                                        )                                2                                                                    ∑                                  i                  <                  j                                            ⁢                              r                ij                2                                                                        (        1        )            which is numerically minimized in order to find the optimal configuration. The actual embedding is carried out in an iterative fashion by: 1) generating an initial set of coordinates yi, 2) computing the distances dij, 3) finding a new set of coordinates yi using a steepest descent algorithm such as Kruskal's linear regression or Guttman's rank-image permutation, and 4) repeating steps 2 and 3 until the change in the stress function falls below some predefined threshold.
A particularly popular implementation is Sammon's nonlinear mapping algorithm (Sammon, J. W. IEEE Trans. Comp., 1969). This method uses a modified stress function:
                    E        =                                            ∑                              i                <                j                            k                        ⁢                                                  ⁢                                                            [                                                            r                      ij                                        -                                          d                      ij                                                        ]                                2                                            r                ij                                                                        ∑                              i                <                j                            k                        ⁢                          r              ij                                                          (        2        )            which is minimized using steepest descent. The initial coordinates, yi, are determined at random or by some other projection technique such as principal component analysis, and are updated using Eq. 3:yij(t+1)=yij(t)−λΔij(t)  (3)where t is the iteration number and λ is the learning rate parameter, and
                                          Δ            ij                    ⁡                      (            t            )                          =                                            ∂                              E                ⁡                                  (                  t                  )                                                                    ∂                                                y                  ij                                ⁡                                  (                  t                  )                                                                                                                                      ∂                  2                                ⁢                                  E                  ⁡                                      (                    t                    )                                                                              ∂                                                                            y                      ij                                        ⁡                                          (                      t                      )                                                        2                                                                                                    (        4        )            
There is a wide variety of MDS algorithms involving different error functions and optimization heuristics, which are reviewed in Schiffman, Reynolds and Young, Introduction to Multidimensional Scaling, Academic Press, New York (1981); Young and Hamer, Multidimensional Scaling: History, Theory and Applications, Erlbaum Associates, Inc., Hillsdale, N.J. (1987); Cox and Cox, Multidimensional Scaling, Number 59 in Monographs in Statistics and Applied Probability, Chapman-Hall (1994), and Borg, I., Groenen, P., Modern Multidimensional Scaling, Springer-Verlag, New York, (1997). The contents of these publications are incorporated herein by reference in their entireties.
Unfortunately, the quadratic nature of the stress function (Eqs. 1 and 2, and their variants) make these algorithms impractical for large data sets containing more than a few hundred to a few thousand items. Several attempts have been devised to reduce the complexity of the task. Chang and Lee (Chang, C. L., and Lee, R. C. T., IEEE Trans. Syst., Man, Cybern., 1973, SMC-3, 197–200) proposed a heuristic relaxation approach in which a subject of the original objects (the frame) are scaled using a Sammon-like methodology, and the remaining objects are then added to the map by adjusting their distances to the objects in the frame. An alternative approach proposed by Pykett (Pykett, C. E., Electron. Lett., 1978, 14, 799–800) is to partition the data into a set of disjoint clusters, and map only the cluster prototypes, i.e. the centroids of the pattern vectors in each class. In the resulting two-dimensional plots, the cluster prototypes are represented as circles whose radii are proportional to the spread in their respective classes. Lee, Slagle and Blum (Lee, R. C. Y., Slagle, J. R., and Blum, H., IEEE Trans. Comput., 1977, C-27, 288–292) proposed a triangulation method which restricts attention to only a subset of the distances between the data samples. This method positions each pattern on the plane in a way that preserves its distances from the two nearest neighbors already mapped. An arbitrarily selected reference pattern may also be used to ensure that the resulting map is globally ordered. Biswas, Jain and Dubes (Biswas, G., Jain, A. K., and Dubes, R. C., IEEE Trans. Pattern Anal. Machine Intell., 1981, PAMI-3(6), 701–708) later proposed a hybrid approach which combined the ability of Sammon's algorithm to preserve global information with the efficiency of Lee's triangulation method. While the triangulation can be computed quickly compared to conventional MDS methods, it tries to preserve only a small fraction of relationships, and the projection may be difficult to interpret for large data sets.
The methods described above are iterative in nature, and do not provide an explicit mapping function that can be used to project new, unseen patterns in an efficient manner. The first attempt to encode a nonlinear mapping as an explicit function is due to Mao and Jain (Mao, J., and Jain, A. K., IEEE Trans. Neural Networks 6(2):296–317 (1995)). They proposed a 3-layer feed-forward neural network with n input and m output units, where n and m are the number of input and output dimensions, respectively. The system is trained using a special back-propagation rule that relies on errors that are functions of the inter-pattern distances. However, because only a single distance is examined during each iteration, these networks require a very large number of iterations and converge extremely slowly.
An alternative methodology is to employ Sammon's nonlinear mapping algorithm to project a small random sample of objects from a given population, and then “learn” the underlying nonlinear transform using a multilayer neural network trained with the standard error back-propagation algorithm or some other equivalent technique (see for example, Haykin, S. Neural Networks: A Comprehensive Foundation. Prentice-Hall, 1998). Once trained, the neural network can be used in a feed-forward manner to project the remaining objects in the plurality of objects, as well as new, unseen objects. Thus, for a nonlinear projection from n to m dimensions, a standard 3-layer neural network with n input and m output units is used. Each n-dimensional object is presented to the input layer, and its coordinates on the m-dimensional nonlinear map are obtained by the respective units in the output layer (Pal, N. R. Eluri, V. K., IEEE Trans. Neural Net., 1142–1154 (1998)).
The distinct advantage of this approach is that it captures the nonlinear mapping relationship in an explicit function, and allows the scaling of additional patterns as they become available, without the need to reconstruct the entire map. However, as it was originally proposed, the method can only be used for dimension reduction, and requires that the input patterns be supplied as real vectors.
Hence there is a need for a method that can efficiently process large data sets, e.g., data sets containing hundreds of thousands to millions of items, and can be used with a wide variety of pattern representations and/or similarity distance functions. Moreover, there is a need for a method that is incremental in nature, and allows the mapping of new samples as they become available, without the need to reconstruct an entire map.