Many applications such as oilfield logging require analysis of many independent data parameters. The measurements can be treated as points in a multi-dimensional data space—an approach that is often convenient mathematically, but extremely difficult for humans to visualize or analyze effectively. Nevertheless, such visualization usually offers insight into the nature of the data, thereby facilitating subsequent use of the data set for interpretation and modeling.
Techniques exist for translating a set of data points having many dimensions (i.e., a “high-dimensionality data set”) into a set of data points having a smaller number of dimensions (i.e., a “low-dimensionality data set”). The number of dimensions for the low-dimensionality data set is often chosen in the range of two to four to enable straightforward visualization of the data. A review on high-dimension data visualization and data dimension reduction can be found in the paper, “DD-HDS: A method for visualization and exploration of high-dimensional data”, by Lespinats et al., IEEE Transactions on Neural Networks, vol. 18, no.5, pp: 1265-1279, September 2007, which is hereby incorporated herein by reference.
Generally speaking, it is desirable to preserve as much as possible the difference, or “distance”, between pairs of data points. Thus, for example, data points that are closely spaced in the high-dimensionality data set should be closely spaced in the low-dimensionality data set, and data points that are widely spaced in the high-dimensionality data set should be widely spaced in the low dimensionality data set. Such preservation of the sample pair distances is believed to preserve the “essential” information contained by the data set.
Since conventional linear mapping methods such as principal component analysis (PCA) do not preserve such distance-based essential information in a satisfactory way, dimensionality reduction is often treated as a non-linear optimization problem. J. W. Sammon, in “A Nonlinear Mapping for Data Structure Analysis”, IEEE Trans. Comput. C-18 (5): 401-409, 1969, introduces the use of an objective function (termed a “stress function” by Sammon) to minimize the mismatch of sample-pair distance between the original and transformed data. P. Demartines J. Herault, in “Curvilinear Component Analysis: A Self-Organizing Neural Network for Nonlinear Mapping of Data Sets”, IEEE Trans. Neural Networks 8 (1): 148-154, 1997, implicitly use a a gradient-based approach to implement their neural-network based dimensionality reduction. In “Graph Drawing by Force-Directed Placement”, Software: Practice and Experience 21 (11): 1129-1164, 1991, T. Fruchterman and E. Reingold adopt the concept of the spring-mass system to adjust and stabilize the low-dimensionality data positions.
M. Raymer et al, in “Dimensionality Reduction Using Genetic Algorithms”, IEEE Transactions on Evolutionary Computation 4 (2): 164-171, 2000, focus on feature selection, feature extraction, and classifier training, to construct a linear transformation matrix that can then be tuned using evolutionary computation. C. Yang et al, in “Dimensionality Reduction Using GA-PSO”, Proc. 9th Joint Conference on Information Sciences, Taiwan, 2006, focus on the feature selection aspect of Raymer with a combined GA-PSO (Genetic Algorithm—Particle Swarm Optimization) approach. It should be noted that Yang integrates PSO into his genetic algorithm using an N-nearest neighbor distance match, and he applies it to each generation.
The foregoing techniques fail to effectively minimize the information loss associated with dimensionality reduction.
The drawings show illustrative invention embodiments that will be described in detail. However, the description and accompanying drawings are not intended to limit the invention to the illustrative embodiments, but to the contrary, the intention is to disclose and protect all modifications, equivalents, and alternatives falling within the spirit and scope of the appended claims.