1. Field of the Invention
The present invention relates to a pattern classification method and apparatus applied to, e.g., extraction of defects and an optimization processing of a pattern dictionary used for pattern classification of types of defects and the like, and to a storage medium readable by a computer.
2. Description of the Related Art
For example, as inspection of a glass substrate of an LCD or inspection of a semiconductor wafer, for example, an image of a glass substrate is picked up in order to obtain its image data, a quantity of pattern characteristics such as an area, a shape or a depth value of a defect is extracted from the image data, and inspection is performed with respect to the extracted quantity of pattern characteristics by using the following pattern classification method.
A pattern dictionary created based on a known quantity of pattern characteristics and its classification category information is registered in advance. Here, the quantity of pattern characteristics is, for example, an area, a shape or a depth value if it is applied to the inspection of the glass substrate. The classification category information is, for example, a type of a pattern defect, a foreign particle, indefiniteness and defocusing.
When the quantity of pattern characteristics of a pattern classification target (input pattern) is inputted, there is carried out pattern matching processing with respect to the quantity of pattern characteristics and the pattern dictionary. As a classification result, numeric information such as a classification category, a similarity or a distance is outputted.
The input pattern is classified into a classification category having a most appropriate value (a value close to 1 in case of the similarity, or a value close to 0 in case of the distance) in the numeric information as the classification result, or the numeric information as the classification result is compared with a given threshold value. If the numeric information is not more than (or not less than) the threshold value, it is determined.
In regard to the similarity or the distance used as the numeric information, a simple similarity or a composite similarity is utilized as a similarity scale. As a distance scale, the Euclidean distance, the urban distance, the Mahalanobis distance or the like is utilized.
The simple similarity as the similarity scale is defined by determining as an evaluation scale s the cosine of an angle formed by a characteristic quantity vector g which is registered in advance and a characteristic quantity vector g′ which is newly extracted.s=(g, g′)/(∥g∥g·∥g′∥)  (1)
Here, (,) indicates an inner product. The similarity s depends on only a direction (that is, an angle) of the characteristic quantity vector, but does not depend on the greatness of the vector. In regard to the similarity scale, as conversion into the distance scale s′, the following distance conversion is often carried out:s′=1.0−simple similarity  (2)
The composite similarity is a similarity value with distribution states of a plurality of learning characteristic quantity vectors taken in to consideration. By using a fixed value λj of the registered characteristic quantity vector and a fixed vector uj, the evaluation scale s with respect to the newly extracted characteristic quantity vector g is defined as follows:s=Σj=1, rλj(gtuj)2/(λ1∥g∥2)  (3)
Thus, improvement in the classification accuracy can be expected as compared with the simple similarity method, but many sets of learning data are required.
As the distance scale, the Euclidean distance d which is the most common distance scale is defined as follows based on the registered characteristic quantity vector g and the newly extracted characteristic quantity vector g′:d=√{square root over ((g−g′)2)}  (4)
The urban distance is also call the Manhattan distance, and defined as follows based on the registered characteristic quantity vector g and the newly extracted characteristic quantity vector g′:d=|g−g′|  (5)
According to the urban distance, it is possible to calculate the distance with high-speed.
The Mahalanobis distance D2 is a distance with dispersion of the registered characteristic quantity vector data taken into consideration as similar to the composite similarity, and it is considered to be presently most preferable for the pattern classification. The Mahalanobis distance D2 can be expressed as follows:D2=dtV−1d  (6)
In the above expression, it is assumed that an inverse matrix of a common variance/covariance matrix of classification category data xi (=xi0, xi1, xi2, . . . , xip: i is a classification category and p is a number of dimensions of a characteristic quantity) is V−1, a difference matrix with respect to a classification category mean value xi^ is d (=xi−xi^) and its transpose is dt.
The Mahalanobis distance D2 is a distance obtained by normalization with each data dispersion of the classification category data. If two sets of the classification category data are uncorrelated, the Mahalanobis distance is equivalent to the Euclidean distance. Further, if the difference matrix d is taken as a difference from the classification category mean value in the above expression (6), the Mahalanobis distance becomes the Mahalanobis general distance between the classification categories.
For example, assuming that there are classification characteristic quantities xc1 and xc2 of a given classification category C and there is the positive correlation between these classification characteristic quantities xc1 and xc2, such an iso-probability ellipse as shown in FIG. 1 can be drawn. In FIG. 1, given characteristic quantity data A and B are arranged provided that they are equally distanced from the classification category mean value G on the Euclidean distance scale. However, the Mahalanobis distances D2 become equal on a locus of the iso-probability ellipse shown in FIG. 1. Since the characteristic quantity data A and B are not positioned on the locus of the same iso-probability ellipse and the Mahalanobis distance of the characteristic data A is close to the classification category mean value G, the characteristic quantity data A is closer to the classification category mean value G than the characteristic quantity data B.
Among the various kinds of scales mentioned above, it is said that the Mahalanobis distance is a distance scale currently most preferable for the pattern classification. Furthermore, although a quantity of arithmetic operation processing is large and a large quantity of registered/learning patterns must be secured, the Mahalanobis distance has begun to be utilized in most pattern classification devices with the recent speed-up of the CPU.
The numeric information of the Mahalanobis distance or the like as a classification result can be obtained by performing pattern matching processing of a pattern dictionary created in advance and an input pattern. The pattern dictionary used in the pattern matching processing, and the pattern dictionary for calculating the Mahalanobis distance in particular, has as constituent elements the inverse matrix V−1 of the common variance/covariance matrix V of the classification category data and the classification category mean value xi^ as represented in the expression (6). These constituent elements are previously obtained from many known quantities of pattern characteristics registered in the pattern dictionary in advance and their classification category information.
In order to create the pattern dictionary for calculating the Mahalanobis distance (the inverse matrix V−1 and the classification category mean value xi^), the registered quantities of pattern characteristics and theirs classification category information are simply used, or some of quantities of pattern quantities which are customarily considered to be suitable for pattern classification are selected by trial and error.
For example, if the pattern dictionary for calculating the Mahalanobis distance is created by utilizing all the registered quantities of pattern characteristics, it is often the case that some of the quantities of pattern characteristics adversely affect, thereby lowering the pattern classification accuracy (pattern classification ratio).
Moreover, it can be considered that the pattern dictionary for calculating the Mahalanobis distance is created by utilizing quantities of pattern characteristics with high contribution obtained by analyzing main components of the registered quantities of pattern characteristics in favor among the quantities of pattern characteristics which adversely affect the pattern classification ratio. However, since the contribution obtained by analysis of main components only indicates the quantity of pattern characteristics which efficiently represents the pattern, the contribution does not directly relate to the pattern classification ratio. That is, even if the quantity of the pattern characteristics with the high contribution is selectively used, the pattern classification ratio is not necessarily improved.
It can be also considered that the case in which the registered quantity of pattern characteristics does not demonstrate the normal distribution is taken into account as a factor affecting calculation of the Mahalanobis distance and the distribution of the registered quantity of pattern characteristics is approximated to one or a plurality of normal distributions, thereby calculating the Mahalanobis distances with respect to one or a plurality of the normal distributions. However, it is usually difficult to determine a plurality of normal distributions which approximate the distribution of the quantity of pattern characteristics which is multi-dimension data.