1. Technical Field
This invention relates in general to feature classification techniques for a static or moving image, and more particularly, to a feature classification system and method which employ supervised statistical pattern recognition using a novel vector classification model of spatially decomposed multi-dimensional feature space.
2. Background Art
As the use of optical scanning and electronic imperfection detection have continued to increase, automatic differentiation of defect classes has come within reach of available technology. Pattern recognition is applicable to feature classification problems because pattern recognition automatically assigns a physical object or event to one of several pre-specified categories. Each defect (or feature) shows up as a connected region in the image and each defect can be assigned to a category.
There are two types of pattern recognition (PR), structural and statistical. Structural methods use a representation of a feature's shape known as a boundary representation (BREP), while statistical methods use an array of numbers or measurements containing properties of each feature; this numerical information is called a feature vector. In structural pattern recognition the picture of the feature can be recreated from the reduced data since the BREP has complete boundary information in polygonal form. In statistical pattern recognition, however, the picture cannot be recreated from its representation; but, a feature vector is a more compact representation of the object than a BREP. With either method, the goal is to construct a classifier, i.e., a machine to automatically process the image to generate a classification for each feature.
In structural pattern recognition, the classifier is based on formal language theory. The BREP is processed into a series of symbols representing the length and direction of the vectors in the boundary. A set of strings consisting of concatenations of these symbols is the language. The grammar, which is a mathematical system of describing the language, describes the structure or the boundary of the features as ordered combinations of symbols. A recognizer, which is constructed from this grammar, works like a computer language compiler used to recognize and distinguish computer language statements. For example, the box of FIG. 1 can be thought of as a language. As shown, L={a.sup.n, b.sup.n, c.sup.n, d.sup.n .vertline. n.gtoreq.1)} is a language describing the box with each side of length "1" or greater.
In real-world problems a feature or defect does not have an exact description so the problem is more difficult than parsing a computer language. A structural recognizer that can handle realistic problems has to be based on complicated context sensitive or stochastic grammars to deal with high data complexity and variation. Because of this, structural methods are not readily implemented in typical engineering situations.
In contrast, statistical pattern recognition uses a recognizer based on statistical decision theory. Several different types of statistical pattern recognition exist but in general there are two main approaches, called supervised and unsupervised classification.
Supervised classification uses a labelled training sample formed when an expert identifies the category of each member of the sample. Probability distributions are estimated or recognizers are constructed directly from the training sample. An expert must examine the data and label each of the features. (Again, the novel pattern recognition approach described herein uses supervised classification.) The effort of labelling a training sample can be made easier by providing a graphical interface to facilitate an expert's interaction with the data.
Unsupervised classification doesn't use a labelled training sample. This approach requires the recognizer to learn the underlying probability distribution of the data as it goes, which is often a difficult problem. However, unsupervised classification also does not require the sometimes lengthy process of accumulating a sufficiently large training sample and it does not necessarily require the effort of identifying the members of this sample. The method can be useful in augmenting supervised classification by allowing the system to adapt to changes in the data.
In addition to the above-noted approaches, statistical pattern recognition employs two main methods, parametric and non-parametric. Parametric methods assume an underlying probability distribution of the real world data. Non-parametric methods make no such assumptions.
Parametric methods are generally used when the distribution is known to be in one of the familiar forms, such as Normal or Gaussian. Classifiers can be generated based on Bayes rule with the a priori distributions known and joint probability distributions determined from the sample data. Specific features can then be compared to the statistics of the known distribution function thereby classifying them.
In real-world situations, the data often does not conveniently fall into a Normal or other known distribution. The distribution of a class of features might be multi-modal, i.e., with two or more peaks in the distribution of a defect category (e.g., see FIG. 2). As in the example of FIG. 2, spot like defects are roundish and easily recognized by an approximately equal length and width, but there might be relatively small and large defects all of which are known as spots. In this example, the spots might be better recognized as the ratio of area to perimeter, which will remain approximately constant as the size of the roundish spots vary. Although a specialized recognizer can be constructed to deal with this example, a method is needed which is generally applicable.
Non-parametric methods provide a more general solution to the above problem in that they generate decision functions or a classifier directly from a training sample. By doing this such approaches bypass the parameter estimation problem and ignore any presumed form for the density function. (As described below, the present invention comprises a non-parametric technique.)
One of the main difficulties of non-parametric methods is the exponential increase in storage and computational requirements as the dimensionality increases. If the feature is described by a large number of measurements (or elements), the problem can grow to be very large when some of the known methods are used. (Since the technique of the present invention divides the feature space recursively by powers of two, the problem is made more manageable because the storage space only increases as the log base 2.)