Generating meaningful joint sensor/signature manifolds includes generating a connected graph for subsequent eigenprocessing and embedding. One technique for generating such a graph uses k-nearest-neighbor selection for defining adjacency. However, this technique is known to be sensitive to cyclic behavior, and for a small number of neighbors (k) in the graph, portions of the data often degenerate into null space. While this technique has been successful in an academic setting, it fails when it is applied in large real-world data sets whose individual signatures vary widely.
Nonlinear dimensionality reduction techniques operate under the assumption that Euclidean measures of similarity are meaningful locally, but not globally. Graphs provide a natural mathematical framework for nonlinear dimensionality reduction. Formally, a graph G consists of a pair of sets (V, E), where V is a set of vertices and E is a set of edges. The set of edges denote pairs of elements of V. A path P is an ordered sequence of vertices v1, v2, . . . , vn with an edge ejk=(vj, vk) ⊂E for all consecutive pairs of vertices in the ordered sequence, ek=(vk, vk+1)⊂ E ∀vk, vk+1 ⊂ P. A graph G is connected if a path exists between every pair of vertices, ∃P(vk, vj) ∀(vk, vj)⊂ V. Two vertices are adjacent, vn˜vk, if an edge exists between them, ejk=(vj, vk)⊂ E.
An edge weight function w: V×V→R is a real-valued label associated with the edge, often representing the edge length or distance between the associated or adjacent vertices. Two common weight functions used on graphs are the simple nearest neighbor. Equation (1) and the Gaussian, or heat kernel. Equation (2):
                              w          ij                =                  {                                                    1                                                                                  v                    i                                    ∼                                      v                    j                                                                                                      0                                                              otherwise                  ;                                                                                        (        1        )                                          w          ij                =                  {                                                                      e                                                            -                                                                                                                                                            v                              i                                                        -                                                          v                              j                                                                                                                                2                                                              a                                                                                                                    v                    i                                    ∼                                      v                    j                                                                                                      0                                                              otherwise                  .                                                                                        (        2        )            
When applied to data, both of these functions can generate non-connected graphs for small k (where k is the limit on number of nearest neighbors to which these functions are applied; the k neighbors are selected in order of increasing distance) or small a, where a gives a physical scale for the heat-kernel approach to defining edge weights (Equation (2)). However, this problem is not detectable until the eigen-decomposition is computed, and the existence of a multiplicity of zero-valued eigenvalues indicates that the graph is not fully connected. The typical solution to this is to increase either k or a, which often results in suboptimal manifolds due to loss of local information, which destroys the ability of the dimensionality-reduction technique to retain the nonlinear characteristics of the original data.
The manifold alignment technique published by Ham (J. Ham et al., “Semisupervised Manifold Alignment,” in R. Cowell and Z. Ghahramani (eds.), Proc. Of the Tenth International Workshop on Artificial Intelligence and Statistics, pp. 120-127, 2005) depends on the graph being completely connected. If the graph components are not connected, the sets of connected subgraphs would be defined in separate eigen systems, and map to each other's null-space by the process. The resultant embedding is meaningful only for a subset of the data. To avoid this problem and to ensure that all points are connected, k is increased to guarantee a connected graph. This leads to problems because the goal of the process is to preserve the local neighborhoods, which can be destroyed when k is large.
The idea of taking two disparate sensors and projecting them into a common space has been tried before (D. Marchette et al., “Comparing Apples & Oranges: Methods for Comparing the Incomparable,” Hawaii International Conference on Statistics and Related Fields (2004)). This approach does not find the underlying manifold of the space first, leading to projections from high dimensions. Another way to execute feature-level fusion is to use joint probabilities and Bayesian networks (S. Ferrari et al., “Demining Sensor Modeling and Feature-Level Fusion by Bayesian Networks”, IEEE Sensors Journal, Vol. 6 (2006)). This approach is problematic in high dimensions because the probabilities are not known and can only be roughly estimated. Feature fusion can also be done by combining features from different sensors into a single feature vector (U.S. Pat. No. 6,594,382, entitled “Neural Sensors” and Issued on Jul. 15, 2003, to Roger Woodall). But this approach suffers from high dimensionality, as well as from the problem of estimating meaningful scaling factors between the sensor-specific feature sets.