Fabricating semiconductor devices such as logic and memory devices typically includes processing a substrate such as a semiconductor wafer using a large number of semiconductor fabrication processes to form various features and multiple levels of the semiconductor devices. For example, lithography is a semiconductor fabrication process that involves transferring a pattern from a reticle to a resist arranged on a semiconductor wafer. Additional examples of semiconductor fabrication processes include, but are not limited to, chemical-mechanical polishing (CMP), etch, deposition, and ion implantation. Multiple semiconductor devices may be fabricated in an arrangement on a single semiconductor wafer and then separated into individual semiconductor devices.
Inspection processes are used at various steps during a semiconductor manufacturing process to detect defects on wafers to promote higher yield in the manufacturing process and thus higher profits. Inspection has always been an important part of fabricating semiconductor devices such as integrated circuits (ICs). However, as the dimensions of semiconductor devices decrease, inspection becomes even more important to the successful manufacture of acceptable semiconductor devices because smaller defects can cause the devices to fail. For instance, as the dimensions of semiconductor devices decrease, detection of defects of decreasing size has become necessary since even relatively small defects may cause unwanted aberrations in the semiconductor devices.
As design rules shrink, however, semiconductor manufacturing processes may be operating closer to the limitation on the performance capability of the processes. In addition, smaller defects can have an impact on the electrical parameters of the device as the design rules shrink, which drives more sensitive inspections. Therefore, as design rules shrink, the population of potentially yield relevant defects detected by inspection grows dramatically, and the population of nuisance defects detected by inspection also increases dramatically. Therefore, more and more defects may be detected on the wafers, and correcting the processes to eliminate all of the defects may be difficult and expensive. As such, determining which of the defects actually have an effect on the electrical parameters of the devices and the yield may allow process control methods to be focused on those defects while largely ignoring others. Furthermore, at smaller design rules, process induced failures may, in some cases, tend to be systematic. That is, process induced failures tend to fail at predetermined design patterns often repeated many times within the design. Elimination of spatially systematic, electrically relevant defects is important because eliminating such defects can have a significant overall impact on yield. Whether or not defects will affect device parameters and yield often cannot be determined from the inspection, review, and analysis processes described above since these processes may not be able to determine the position of the defect with respect to the electrical design.
One method to detect defects is to use computer vision. In computer vision, a model, such as a convolutional neural network (CNN) may be used to identify defects. A CNN may be provided with a variety of images from a wafer and a set of known defects. One of the most common tasks is to fit a model to a set of training data, with the goal of making reliable predictions on unseen test data. Usually one needs several hundred examples of each at a minimum. Very often this much data is not available or it takes too long to collect this data.
In addition, it is possible to overfit the CNN. In overfitting, a statistical model describes random error or noise instead of the underlying relationship. For example, FIG. 1 illustrates a plurality of images 10 showing wafer noise in difference images of adjacent dies. Overfitting occurs when a model is excessively complex, such as having too many parameters relative to the number of observations. A model that has been overfitted has poor predictive performance, as it overreacts to minor fluctuations in the training data.
Likewise, underfitting occurs when a statistical model or machine learning algorithm cannot capture the underlying trend of the data. Underfitting would occur, for example, when fitting a linear model to non-linear data. Such a model would have poor predictive performance.
The possibility of overfitting exists because the criterion used for training the model is not the same as the criterion used to judge the efficacy of a model. In particular, a model is typically trained by maximizing its performance on some set of training data. However, its efficacy is determined not by its performance on the training data but by its ability to perform well on unseen data. Overfitting occurs when a model begins to “memorize” training data rather than “learning” to generalize from a trend. As an extreme example, if the number of parameters is the same as or greater than the number of observations, a simple model or learning process can perfectly predict the training data simply by memorizing the training data in its entirety, but such a model will typically fail drastically when making predictions about new or unseen data, since the simple model has not learned to generalize at all.
The potential for overfitting depends not only on the number of parameters and data but also the conformability of the model structure with the data shape, and the magnitude of model error compared to the expected level of noise or error in the data.
In order to avoid overfitting, it is necessary to use additional techniques, such as data augmentation. Data augmentation takes existing data, such as existing wafer images, and applies mathematical functions to the data in order to create new, but similarly indicative images. For example, currently used data augmentation techniques include rotation, translation, zooming, flipping, and cropping of images.
However, these techniques cannot easily be used in the field of defect inspection. For example, rotation has only limited value as wafers can only be inspected in one or two orientations (0 and 90 degrees). Zoom is constant during the inspection process and thus is also of limited value. Translation, flipping, and cropping of images can be used, but these augmentations are often insufficient to generate enough augmentation data, especially when it comes to making the CNN robust to die-to-die or wafer-to-wafer process variation.
Furthermore, the prior art data augmentation techniques fall especially short when dealing with random wafer noise as illustrated in the difference images 10 of adjacent dies in FIG. 1. Augmenting the input data set with meaningful entirely random wafer noise is difficult, but should be taken into account when dealing with random process variation which is one of the most challenging wafer noise sources.