An image-based decision system processes and extracts information from an image to make decisions on the presence of objects of interest. Examples are image decision systems for determining disease, defects, the acceptance or rejection of measurement parameters such as dimensions, intensity, structures, etc. Image-based decision systems have broad application areas such as machine vision, non-contact gauging, inspection, robot guidance, medical imaging, etc. Learning or teaching is required to optimize an image-based decision system for non-trivial applications. In machine learning, an important attribute of good learning samples is their complete representation of the distribution of real world data where the final system will operate. In general, learning is most reliable when the learning samples follow a distribution similar to the distribution of data in the real application where the system will be used. Therefore, learning often requires time and data, in amounts that are not usually available because application environments frequently change or include rare (but important) events.
Most prior art methods of machine learning rely on the assumption that the distribution of learning samples is identical or sufficiently close to the distribution found in the real world. However, there are many instances where obtaining the learning samples that mimic the real world distribution is not feasible or economical. In many cases, the negative learning samples (such as defects) are hard to obtain and it is difficult to train the system when the system is configured to process for a newer set of data. Also, the collected defect samples will likely not show an equal prevalence of each type of defect.
Some types of defects might be omitted entirely in the training process.
For example, in a machine vision system that inspects defects on semiconductor wafers, the input image patterns become different whenever the fabrication line switches to a different Integrated Circuit (IC) design or process level since the wafer pattern varies when a different IC design is put on the wafer.
In this case, the learning system has to additionally learn on new learning samples to be effective, and it would require a cumbersome truth-labeling task for new data, which demands extensive human interaction. Human-provided truth may not be reliable either, especially for ambiguous learning samples where humans tend to have difficulty in rendering consistent truth labels due to the lack of the ability to objectively and quantitatively assess the data. This problem is exacerbated when either positive or negative learning samples are hard to obtain. For the above example, the number of defective wafer images is usually less than 2% of the total number of wafer images to be examined. Therefore, it takes an inordinate amount of time and effort to collect a reasonably large set of learning samples to cover a variety of defects (e.g., scratches, particles, contamination from coating, defocus, exposure, etching or development errors, and chemical mechanical planarization error in different background conditions) to achieve high learning accuracy. Also, the collected defect images will likely not show an equal prevalence of each type of defect. Some types of defects may be omitted entirely. Alternatively, defects that do occur tend to look nearly identical since when they occur, they are repeated by the same causing factor until the error cause is corrected. Therefore, even though a large number of samples are obtained, they may all show essentially the same limited characteristics. The variability of the true application situation is therefore not well represented.
In the case where a sufficient number of learning samples are not available, incremental learning might be used. In incremental learning, the system learns on the new incoming data as they are encountered. However, when positive samples are dominant and negative samples scarce (or vise-versa), the learning takes time to reach a mature state to become useful.
Another known method of overcoming this problem is analytical learning called explanation-based learning [Mitchell, T. M., Machine Learning, WCB/McGraw-Hill, 1997, Chapter 11, pp 307–330]. In this method, the user provides additional information to help the system to narrow down the search space. However, this method requires the intervention of experts who must be very well acquainted with the learning system. This is burdensome and does not usually provide consistent results.