This invention relates generally to pattern recognition and multivariate analysis, and in particular to a method and a template used for data clustering which simulates the human visual system.
Clustering analysis is a technique used to explore the relationships between data and assess the interaction among data by organizing the data into groups or clusters. Data within a cluster are more similar to each other than are data belonging to other clusters. Jain, A. K., Chapter 2, Cluster Analysis from Handbook of Pattern Recognition and Image Processing, Academic Press, Inc., pp. 33-57, (1986). Clustering analysis can be used when no prior knowledge regarding the presence or number of such patterns is available. Cluster analysis has applications in machine vision, pattern recognition, unsupervised and supervised machine learning/classification, medical and biological image and data analysis, crop identification from satellite photos, identification of hazardous chemicals in complex environments.
Pattern recognition uses an existing set of training data which have identified groupings or classes and assigns a grouping to newly measured test data. Specific examples of pattern recognition applications include: recognition of a large set of chemicals using a small set of chemical sensor signals; distinguishing vegetation, water, buildings, mineral deposits, etc. in multispectral satellite images; distinguishing different tissue types in medical magnetic resonance imaging data; target recognition in radar images; threat detection and false alarm avoidance in automated intrusion detection in video surveillance images.
Clustering has been broadly classified into the hierarchical and the partitional approaches. The hierarchical structure also has two models, one of which begins with n clusters, one per pattern, and grows a sequence of clusterings until all n patterns are in a single cluster; this approach is referred to as agglomerative. The other hierarchical model, the divisive approach, begins with one cluster containing all n patterns and successively divides clusters until n clusters are achieved. Partitional clustering techniques organize patterns into small numbers of clusters by labeling each pattern in some way, and make use of criterion functions, density estimators, graphs structure and nearest neighbors. Fuzzy partitional clustering deals with the overlapping case in which each pattern is allowed to belong to several classes with a measure of "belongingness" or a weighting factor for each class.
Some present clustering analysis techniques require prior knowledge or assumptions concerning, for example, the mean and location of the cluster pattern, and the cluster result, especially the number of clusters present in the data set. The available clustering systems may also require pattern-dependent adjustments. An example of a hierarchical clustering analysis technique which requires additional input is U.S. Pat. No. 4,937,747 entitled "Iterative Disjoint Cluster and Discriminant Function Processing of Formation Log Responses and Other Data" to Koller and U.S. Pat. No. 5,012,675 entitled "Integrating Multiple Mappable Variables for Oil and Gas Exploration" to Koller et al. These Koller patents require additional information about subsurface structures and use disjoint clustering which requires specifying the number of clusters into which the data set are to be classified, and then uses discriminant function analysis wherein data of one cluster may be assigned to another cluster. U.S. Pat. No. 5,060,277 entitled "Pattern Classification Means using Feature Vector Regions Preconstructed from Reference Data" to Bokser is a partitional clustering technique which requires a priori knowledge about the classification and then generates feature vectors of word processing font data and then compares with references for appropriate classifying into ringed clusters. U.S. Pat. No. 4,991,092 entitled "Image Processor for Enhancing Contrast between Subregions of a Region of Interest" to Greensite generates a distribution density function of a signal which is assigned to a region of interest, then after convolution of the distribution density function, the variance is compared and assigns a particular value to a pixel of the image corresponding to the region of interest.
Clustering has been recognized as an important component of the human visual system since the Gestalt principles were first discussed in Wertheimer, M., "Laws of organization in perceptual form," translated into English in A Source Book of Gestalt Psychology (W. Ellis, ed.), pp. 71-88 (1938). Despite many studies in subsequent decades, a quantitative understanding of perceptual grouping/clustering has not been achieved for the human visual system. Achieving human-like performance has been a long standing goal of clustering research but the results of existing clustering analysis techniques often disagree with human visual judgments, even for remarkably simple data patterns. Jain, A. K. and Dubes, R. C., Algorithms for Clustering Data, Prentice Hall (1988); Duda, R. O. and Hart, P. E., Pattern Classification and Scene Analysis, John Wiley (1973); Hand, D. J., Discrimination and Classification, John Wiley (1981); and Jain, supra in Cluster Analysis all describe many of these approaches. Prior art pattern recognition techniques also do not provide a well-defined warning when a class assignment of a point is uncertain. This is unlike human perception which readily notices points which cannot be unambiguously assigned to one class of the training data. U.S. Pat. No. 5,040,133 entitled "Adaptive Clusterer" to Feintuch et al. attempts to approximate human vision of gestalts by taking any two points and generating a parameter proportional to the distance between the points, and then taking all points within that parameter as being within the cluster. Although Feintuch's system does not require a priori knowledge about the clusters or the number of clusters, it does not in fact adequately approximate human vision. See also Jain, A. K., "Cluster Analysis" in Handbook of Pattern Recognition and Image Processing, Academic Press, 1986; and Zahn, C. T., "Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters," C-20 IEEE TRANS. COMPUTERS 68 (1971).
Another drawback of some existing techniques is the computational time required to compute clusters or class assignments. Computational time becomes important for applications such as image analysis, where large numbers of data points, typically &gt;10.sup.5, are examined.
Some clustering techniques use a region of influence or template to determine clusterings between each pair of data points. A template area of fixed shape is examined around each pair of points in the data, and the pair is grouped together if no other data points occur within the template area. If other data points do occur in the template area, they are referred to as inhibitor points below. Successful regions of influence must cover an appropriately large and appropriately located area to avoid extra grouping errors while simultaneously being small enough to avoid the onset of missed grouping errors. A number of template shapes have been examined and are shown in FIG. 21, but none produce clusters in agreement with human perception. See Toussaint, G. T., The Relative Neighborhood Graph of a Finite Planar Set, 12 PATTERN RECOGNITION 261 (1980); Urquhart, R., Graph Theoretical Clustering Based on Limited Neighborhood Sets, 15 PATT. RECOGNITION 173 (1982); Krivanek, M. The Use of Graphs of Elliptic Influence in Visual Hierarchical Clustering, 452 LECTURE NOTES IN COMPUTER SCIENCE 392 (1990). Moreover, the methods of cluster analysis using these templates require computations that scale as N.sup.3, where N is the number of data points.
It is thus an object of the invention to provide a clustering template and method that achieves human-like judgment of class membership for n-dimensional test points. The feature of the invention, the psychophysical-derived inhibitory template, when applied to the data set enables the human-like clustering performance.
It is yet another object of the invention to provide a clustering/pattern recognition method to separate data into clusters completely automatically. The clustering template and method described herein require no operator-adjustable parameters or extensive neural net training runs. An advantage of this feature is that presuppositions about the data or the sensing environment are not required and the data can be analyzed from a more objective and neutral perspective.
It is yet another object of the invention when used for pattern recognition to automatically identify test points which might belong to more than one class and to identify outliers. The feature of the invention that achieves this object is the use of clustering to assign class identities to test points.
It is yet another object to provide a clustering/pattern recognition approach that does not require any assumptions concerning the properties of either the training or test data, i.e., non-Gaussian distributions, linearly inseparable distributions, multimodal distributions, highly nonlinear problems are all acceptable.
It is still another object of the pattern recognition method of the invention to evaluate a training set itself for reliable classification and discrimination into classes. The feature which achieves this object is the use of the template in the clustering to confirm class identities of the training data.
It is an object of the invention to cluster large numbers of data points, N, by the method described herein using nearest neighbors to determine approximate groupings and then screening the approximate groupings for exactness in O[N.sup.2 ] computation time.
These and other objects as well as the features and advantages of the invention are realized by a new approach to clustering and pattern recognition techniques which matches the performance of human perception and judgment. The method is based on an inhibitory template which is applied to each pair of dots in a data set. Direct clustering of the pair is allowed if another data point is absent within the area of the template, but clustering of the two data points is inhibited if there is another data point within the area of the template. The clustering performance of the method is thus entirely determined by the template shape which was determined by psychophysical experiments. The empirical clustering technique incorporates human judgment data of clusters with unlabeled data points and so directly mimics human performance. The resulting method uses no prior knowledge of the data, e.g., number of clusters present, and no data-dependent parameter adjustments. The novel concept of a psychophysically-defined inhibitory template and the absence of adjustable parameters are features which set this approach apart from the prior art. In fact, the invention herein successfully clusters complex data sets from the literature that have collectively thwarted all prior techniques.
A pattern recognition technique is also developed using this same template. The test data points are individually clustered to the training data using the empirical template method. Pattern recognition is applied to a wide variety of sensing problems where signals from sensors are to be interpreted. The measurements from each different sensor are combined as coordinates of a single vector, and the vectors are the data points of interest. A test point is assigned to the class or classes of the training data points with which it clusters. Test points which group with points from a single class are unambiguously assigned that class value. This method readily identifies test points with uncertain class identity as those that cluster with training points from more than one class, i.e., no single class seems appropriate.
Although straightforward application of the template method appears to require O[N.sup.3 ] computation time, the method is computationally practical and can be implemented to run in O[N.sup.2 ] for data of arbitrary dimension. The invention also allows for the handling of large data sets through approximations that require computations that scale for two-dimensional data as O[NlogN]. The approximations limit the examinations of potential pairs to group together and of potential inhibitor points to ones that are likely to be the most important.