1. Field of the Invention
This invention relates to a method and apparatus for defect detection and classification. More particularly, the invention relates to an automated defect classification system using fuzzy logic to automatically analyze measurements of various features of the defect and to automatically classify the defect based on the analysis.
2. Description of the Prior Art
Visual defect inspection and classification have become essential parts of many electronics manufacturing processes. Whale defect detection is critical for ensuring product quality, defect classification provides the information necessary to correct process or product problems. Defect classification is the task of sorting the defects into a set of predefined, meaningful categories often related to causes (e.g. foreign material) or the consequences (e.g. killer vs. cosmetic) of the defects. The classification data are commonly used in yield prediction, process diagnosis., and rework/scrap decisions.
Currently, many defect detection tools are available, each tailored for inspecting certain products with the objective of locating defects accurately while maximizing throughput. The output of these tools reveals little information about the defects themselves and hence is usually reviewed by human operators. In the defect review process, the operator first locates (redetects) the defect in the microscope field of view, then classifies the defect based on its appearance and context. This process is usually more time consuming than the initial detection. Hence it is customary to review and classify only a small sample of the detected defects. Moreover, operators tend to be inconsistent and defects are often misclassified. Accuracy rates as low as thirty to fifty percent are common in semiconductor manufacturing lines. Automating the classification process would reduce the operator work-load, allow more defects to be reviewed, and improve the accuracy of defect calls.
Many automated machine vision defect detection systems have been developed for manufacturing inspection and a few of these systems include defect classification capability. Although the reference comparison method generally used to detect defects is applicable to a wide variety of products, defect classification, when it is included in the automated inspection system, is limited in application to the target product. That is, the machine vision based classification is based on product specific parameters and cannot be ported from one application to another.
In cases such as sheet metal, paper and textile manufacture, the product is statistically homogeneous, so that anomalies can easily be segmented and sorted on the basis of size, shape, brightness and/or color. Examples of automatic defect classification in these industries can be found in U.S. Pat. No. 4,519,041 directed to real time automatic surface imperfection detection and classification; Cho, et al., "A computer vision system for automated grading of rough hardwood lumber using a knowledge based approach," International Conference on Systems, Man and Cybernetics, pp. 345-350, IEEE, 1990; Cho et al., "A neural network approach to machine vision systems for automated industrial inspection," International Joint Conference on Neural Networks, pp. 205-210, IEEE, 1991; and Giet et al., "Multiresolution image processing for rough defect classification," in Industrial Inspection II, pp. 214-224, SPIE, 1990. In the electronics industry, defect classification has been generally limited to printed circuit board inspection. Current implementations can sort defects into broad categories such as shorts, opens, pinholes and extraneous material. This can be accomplished by performing design rule checks on the simple (binary) pattern as shown by Mandeville, "Novel method for analysis of printed circuit images," IBM Journal of Research and Development, no. 1, pp. 73-86, 1985. Recently, systems have been developed to detect and classify defects on populated printed circuit boards, an example of which is described in Teoh, et al., "Automated visual inspection of surface mount pcb's," in 16th Annual Conference of IEEE Industrial Electronics Society, pp. 576-580, IEEE, 1990. Defect classification for mask inspection is disclosed in U.S. Pat. No. 4,587,617, directed to image inspection system for IC wafers. Only a few attempts to classify defects on integrated circuits have appeared in the literature. They are similar to printed circuit board inspection systems and limited to defects on wiring levels, Dralla et al., "Automatic classification of defects in semiconductor devices," Integrated Circuit Metrology, Inspection and Process Control IV, pp. 173-182, SPIE, 1990, or simple rectilinear patterns, Chi, et al., "Using the cesm shell to classify wafer defects from visual data," in Automated Inspection and High Speed Vision Architectures III, vol. 1197, pp. 66-77, SPIE, 1989. An exception is Rao et al., "A classification scheme for visual defects arising in semiconductor wafer inspection," Journal of Crystal Growth, vol. 103, no. 1-4, pp. 398-406, 1990, which classifies texture anomalies on silicon.
Despite these attempts at automatic defect classification, most inspection systems rely on manual review of defects. Since many defects will be unique to certain products and to certain stages in the product's manufacture, it would be very costly to develop specific defect classification systems for each inspection. Furthermore, inspection requirements are greatest in the early stages of product development. This means that classification tools must be developed rapidly, a requirement that adds to their cost and limits their flexibility.
Traditional approaches to classification fall mainly into two categories: rule-driven (top-down) and data-driven (bottom up). The rule-driven approach seeks to incorporate expert knowledge. It is most often implemented in the form of a decision tree using binary (`deterministic`) logic. This has the effect of drawing rigid boundaries around classes in the feature space. For example, a defect with an area of X units might be considered gross, while a defect with an area X-1 units might be called not-gross. This approach fails to capture the uncertainty and imprecision which is characteristic of the term "gross defect". Another problem with the decision tree is that it encodes knowledge in a highly structured form. There is no provision for processing of conflicting rules, which may represent disagreement among experts. These weaknesses of the traditional AI approach often lead to adoption of the data-driven approach. A data-driven approach abandons the attempt to directly encode expert knowledge and relies instead on "learning" how to classify based on a corpus of already classified training data. There are three general methods that are widely used: discriminant algorithms, multilayer neural networks, and Bayesian theory.
Examples of algorithmic techniques include nearest-neighbor and linear discriminant (perceptron) methods. In each case, the algorithm is in essence a strategy for drawing rigid class boundaries in feature spaced based on the training data. Again, there is no notion of uncertainty built into these methods. Neural networks estimate class boundaries using stochastic approximation in the context of a computational architecture derived from neural biology. In principle, given enough data, the right arrangement of neurons, and enough time to converge to a solution, neural networks can learn even very complex, non-linear class boundaries. However, there are numerous practical problems with neural nets including availability of adequate training data and deciding on the correct architecture to use. Neural networks are also difficult to tune except by complete retraining. So far, they seem best suited to simple two-way classification or decision problems (e.g. is this object a bomb or not?).
The only method mentioned so far which incorporates uncertainty in a fundamental way is Bayesian theory. Rather than drawing class boundaries in feature space, Bayesian classifiers estimate probability distributions for various events as a function of feature vectors. This combined with a priori probabilities of occurrence of each type of event yields a net probability for each event given an input feature vector. The Bayesian classifier can then choose the event with highest probability (thus, Bayesian classifiers are said to minimize the overall probability of classification error), or can simply report the calculated probabilities to the user.
Bayesian classifiers have proven successful in a number of applications. However, they typically rely on an explicit mathematical model for probability distributions, for example, normal (Gaussian) or logistic distribution functions. If the data does not fit the chosen model well, this approach is less effective. Also, while this approach captures uncertainty in the modeling of event arrivals as random variables, it does not deal with imprecision in the definition of classes. Thus, Bayesian theory is more difficult to apply in situations where class definitions are vague and non-exclusive (an event may legitimately belong to more than one class).
Yoda et al. describe a wafer defect detection and classification system in "An Automated Wafer Inspection System Using Pipelined Image Processing Techniques," IEEE Transactions and Pattern Analysis and Machine Intelligence, Vol. 10, No. 1, Jan. 1988. The classification part of the system takes the defect image and compares it to binary images representing each design level of the product. The features are collected from projecting the defect area on to each pattern level and computing the shape of the defect that resides on that level's pattern. The specific features are area and bounding box. These features are channeled into a simple rule based classifier that does not provide for uncertainty in the defect measures.
There is a broad literature on using concepts of fuzziness as an uncertainty measure in classification applications. Much of this literature, however, deals with the incorporation of fuzziness into data-driven discriminant algorithms, such as fuzzy k-NN (k-nearest neighbor) algorithms. Keller, et al., "A fuzzy k-nearest neighbor algorithm," IEEE Transactions on Systems, Man and Cybernetics, vol. 15, no. 4, pp. 580-585, 1985. Such methods may perform somewhat better than discriminant algorithms which do not use fuzziness, but they retain most of the disadvantages of their non-fuzzy counterparts: they are totally dependent on the quality of their training data and they do not represent class vagueness in any realistic way. Fuzzy logic can also be used in a rule-driven fashion to encode expert knowledge. An early example of this in classification was the fuzzy decision tree, Chang et al., "Fuzzy decision tree algorithms," IEEE Transactions on Systems, Man and Cybernetics, vol. 7, no. 1, pp. 28-35, 1977. Again, fuzzy decision trees are an improvement over the bivalent type, but retain the disadvantage of being too dependent on hierarchical structure. While some human reasoning is structured in this way, much of it is not.
Fuzzy inference architecture has been used in control problems and other types of decision support applications, for example risk analysis or evaluation of candidates for a job. Wang, "A Fuzzy Expert System for Remote Sensing Image Analysis," Digest International Geoscience and Remote Sensing Symposium, Vol. 2, pp. 848-851, 1989, discloses the use of fuzzy logic for remote sensing of geographical images.
In spite of the importance of automating classification, to date there are no systems available that could meet defect classification requirements in the semiconductor manufacturing area. The failure to develop automated classification is primarily due to the problems of characterizing defects in conventional machine vision systems. The problems arise from the unpredictable appearance of defects, their size range (varying from below the optical resolution limit to very large), the difficulty of determining the three dimensional characteristics of a defect from a two dimensional image, and the large amount of acceptable process variation in the manufacturing process which can easily lead to false defect classifications. Current defect detection systems lack the resolution to perform classification. A truly generic defect classification system which can be quickly tailored to a new application would offer a great advantage over the current application of specific solutions. The present invention provides such a system.