The present invention relates to methods and apparatus for optically detecting defects in or on a smooth surface of a substrate such as a silicon wafer, and for classifying the defects in terms of defect type and size.
Optical inspection methods are frequently used for inspecting the quality of a smooth surface of a substrate such as a silicon wafer, computer disk, or the like. In most such inspection systems, the surface is impinged with a beam of laser light and the light scattered and reflected from the surface is collected and converted into electrical signals that are analyzed so as to infer the presence and size of certain defects on the surface. At least in the case of optical inspection of silicon wafers that are used as the starting material for making integrated circuit chips, the types of defects of major concern include particles of foreign materials on the surface, pits formed in the surface, scratches in the surface, and others.
Particles on the wafer surface can interfere with the lithography process by which lines of electrically conductive material are formed on the surface. As a general rule, any particle whose diameter is larger than half the width of the electrical lines to be laid onto the surface constitutes an unacceptable defect. If there are too many such particles, the wafer must be rejected. Currently, integrated circuits are being made with line widths as small as 0.25 xcexcm (250 nm), so that particles larger than 125 nm in diameter occurring on the wafer surface would be cause for rejecting the wafer, while particles smaller than 125 nm would be tolerable. The semiconductor industry is quickly moving towards production of circuits composed of 0.18 xcexcm and then 0.15 xcexcm lines, which means that much smaller particles will soon cause concern.
Wafer inspection systems must be calibrated in order to function properly to accurately determine the diameter of a particle. The calibration is typically done by intentionally placing a plurality of particles of various known diameters on the wafer surface and inspecting the wafer with the inspection apparatus, so that the scattered light intensities produced from the various sizes of particles can be correlated to the particle sizes. These calibration particles are usually spheres made of polystyrene latex (PSL).
One difficulty that has been encountered in wafer inspection processes is that identically sized particles of different material types can produce substantially different scattered light intensities. Stated differently, two particles made of different materials and having substantially different diameters may produce virtually the same measured scattered light intensity. For example, it has been found that silicon particles of a given diameter will produce a much larger scatter intensity than the same diameter PSL particle. In fact, among the various types of materials that can commonly appear on a wafer surface in the form of particles, PSL particles tend to be one of the lowest sources of light scattering. Thus, after the inspection apparatus has been calibrated with PSL particles, the apparatus will tend to overestimate the diameters of silicon particles and those of many other materials. Accordingly, wafers are rejected as having particles larger than half the line width, even though in reality the particles may be smaller than half the line width. Therefore, the accuracy with which particles can be sized by light scattering can be greatly increased if something is known about the particle material.
Another advantage to the semiconductor industry in being able to identify particle material is that this information provides a strong clue as to the source of the contamination. Because particle contamination has to be reduced to a level where useful product can be produced, finding and eliminating contamination sources quickly is economically important.
For light of a given wavelength, every material has an index of refraction, which indicates how much the speed of light is reduced within the material, and an absorption coefficient, which is generally indicative of how opaque the material is to the light. The combination of the index of refraction and absorption coefficient, which are known as the material constants, is unique for each different material. The combinations of these material constants can be roughly separated into four groups: (1) dielectrics such as PSL, SiO2, and Al2O3 (low index of refraction and zero absorption coefficient); (2) semiconductors such as silicon (large index of refraction and small absorption coefficient); (3) gray metals such as tungsten (large index of refraction and large absorption coefficient); and (4) good conductors such as silver (small index of refraction and large absorption coefficient).
The combination of particle material, along with particle shape and the material constants for the substrate surface (which are known), completely and uniquely define the pattern of light scattered by the particle for any given light source. Moreover, for particles whose average diameter is less than about one-fifth of a wavelength of the illuminating light, the particle shape does not play a significant role in determining the scatter pattern. Thus, for visible light and particles smaller than about 100 nm, knowledge of the average particle diameter and the various material constants are enough to calculate the scatter pattern for a given scattering geometry. This fact has allowed the development of scattering models that predict scatter patterns for a given set of conditions. These models have been experimentally confirmed and the results published.
What would be desirable is a system and method for solving the more difficult inverse problem. That is, it would be desirable to be able to determine the particle material and average particle diameter from a knowledge of the scatter pattern. Heretofore, methods have been developed for determining average particle diameter by analyzing the scatter pattern, for example as described in commonly owned U.S. Pat. No. 5,712,701, which is hereby incorporated herein by reference. However, as noted above, the accuracy of such methods depends on the calibration of the system, and currently the calibration must be performed using PSL spheres, which have substantially different material constants from some of the other materials that can appear as particles on a wafer.
Methods for identifying particle material have been proposed. For instance, U.S. Pat. No. 5,037,202 to Batchelder et al. discloses methods and apparatus in which two parallel light beams that are initially mutually coherent but of different polarizations are focused onto a focal plane (such as the surface of a wafer) such that they are displaced apart from each other at the focal plane. After the beams are reflected from the surface, a further optical system intercepts the beams and combines them so that a particle-induced phase shift in one of the beams is manifested by a change in the elliptical polarization of the combined beams. A first detector is responsive to the combined beam""s intensity along a first polarization axis to produce a first output, and a second detector is responsive to the combined beam""s intensity along a second polarization axis to produce a second output. The first and second outputs are added to provide an extinction signal and are subtracted to provide a phase shift signal. The phase shift and extinction are correlated with index of refraction of the particle material, and hence the identity of the material purportedly can be determined based on the phase shift and extinction values. The size of the particle purportedly can be inferred from its position on a curve of extinction versus phase shift. Thus, in Batchelder""s system and method, information about the particle is inferred by analyzing the specularly reflected beams. A disadvantage of this approach is that the reflected light is relatively insensitive to changes in particle properties, such that small particles (e.g., particles on the order of 100 nm or smaller) will produce quite small changes in the specularly reflected beams that can be difficult to accurately measure. Accordingly, the Batchelder approach may not be optimum for identifying small particles of the size that begin to cause problems in integrated circuit manufacturing.
U.S. Pat. No. 5,515,163 to Kupershmidt et al. discloses methods and apparatus in which a polarized laser beam is intensity modulated at a first frequency and is split into two orthogonally polarized beams, and the two beams are phase shifted relative to each other at a second frequency. The two phase-shifted beams are directed onto the surface being inspected, and light scattered by particles at an angle to the two beams is detected. The detected light is synchronously demodulated to determine the amplitude of the scattered light at the frequency of intensity modulation and the amplitude and phase of the scattered light at the frequency of phase modulation. These quantities purportedly can be correlated to size and refraction index of particles to permit identification of particles. Kupershmidt""s method involves complicated calculations, and the measurements require sampling over a number of modulation cycles in order to obtain accurate measurements for a given scanned portion of the surface being inspected. Accordingly, scanning of the entire surface would likely be relatively slow.
The assignee of the present application has developed a method and apparatus for identifying the material of which a particle is made, as described in commonly assigned U.S. Pat. No. 6,122,047, the entire disclosure of which is incorporated herein by reference. The method involves measuring the scatter pattern produced by light scattering from a particle, and comparing the measured scatter pattern with a plurality of predetermined scatter patterns produced by particles of various known materials and sizes, so as to identify the predetermined scatter pattern that most nearly matches the measured one. The scatter pattern is defined by signals from a plurality of light collectors positioned in different locations with respect to the particle and incident light beam.
The present invention represents an improvement over the method and apparatus disclosed in U.S. Pat. No. 6,122,047, and is applicable not only for identifying particle material and size, but also for identifying and sizing other types of defects such as pits, scratches, subsurface defects (e.g., voids), etc. In accordance with the invention, a preferred method for classifying defects occurring at or near a surface of a smooth substrate proceeds as follows:
(a) The method begins by defining a plurality M of different idealized types of defects, labeled m=1 to M (e.g., spherical particle, conical pit, etc.), that can occur at the surface of the substrate, such that a given defect occurring at the surface can be categorized into one of the M defect types and can be described in terms of size, at least approximately, by a size parameter dm.
(b) Next, for each of the M different idealized types of defects, a database is constructed containing a plurality N of different sets of data, labeled n=1 to N. Each set of data comprises a relationship between a magnitude S of a signal from a light collector versus size parameter. The N different sets of data for each defect type correspond to N different predetermined test configurations in which a beam of light having predetermined characteristics is impinged on the surface of the substrate at a predetermined incident angle and light emanating from the surface is collected by the light collector at a predetermined location above the surface and the intensity of the collected light is measured. Each of the N different test configurations differs from each of the other configurations in some respect that produces an independent relationship between a signal magnitude and a defect size. The N different predetermined test configurations are the same for each idealized defect type. Thus, there are M different databases each containing N different sets of data. For example, if the defect types comprise spherical particles, conical pits, ellipsoidal particles, and mounds (i.e., M=4), and there are three different test configurations represented by three light collectors located in different locations above the substrate surface (i.e., N=3), there would be four different databases each containing three data sets of signal magnitude versus defect size.
(c) The substrate is tested with each of the N different predetermined test configurations so as to derive N signal magnitudes S1, . . . , SN for a defect to be sized and classified.
(d) Each of the M different databases is consulted to determine a defect size d corresponding to each of the N measured signal magnitudes S1, . . . , SN. Thus, N different sizes d1 to dN are determined for each of the M different idealized defect types.
(e) For each of the M different defect types, an average size  less than d greater than  is calculated based on the N different sizes d1 to dN, and based on this average size  less than d greater than , the respective database is used to determine the signal magnitudes that would be produced by the respective defect type having the size  less than d greater than . These signal magnitudes are denoted  less than S1 greater than  to  less than SN greater than . In general, these signal magnitudes will differ from one another, with the degree of variance between them being generally indicative of how closely the defect being analyzed resembles the respective idealized defect type.
(f) Thus, for each idealized defect type, a deviation parameter "sgr" is calculated representing a combined deviation between the measured signal magnitudes S1 to SN and the determined signal magnitudes  less than S1 greater than  to  less than SN greater than , so as to derive a plurality of deviation parameters "sgr"1 to "sgr"M corresponding to the M different idealized defect types.
(g) The defect being analyzed is classified as being of the idealized type having the smallest deviation parameter "sgr". The size of the defect is determined based on the average size  less than d greater than  corresponding to that idealized type.
In an alternative method in accordance with the invention, steps (a) through (d) above are performed, and the defect is classified as being of the idealized type having the smallest spread between the plurality of sizes d1 to dN. The size of the defect is determined based on an average of the sizes d1 to dN.
The plurality N of test configurations can be provided in various ways. In one embodiment, a plurality of light collectors are positioned above the surface of the substrate for collecting light scattered to various regions of the space of the surface. Thus, the plurality of test configurations are effected simultaneously when a single light beam is impinged on a given point of the substrate surface.
Alternatively, the plurality of test configurations can be effected by changing a property of the incident light beam or a property of the scattered light received at a light collector. For instance, the incident angle of the light beam can be varied to produce the plurality of test configurations, or the polarization of the incident beam can be varied. Another possibility is to vary the polarization of the light received at the light collector. The method used for effecting the plurality of test configurations is not critical, as long as the different configurations produce discernable differences in the characteristics of the light that rebounds from the substrate surface and defect so that such differences can be correlated with defect type and size.
A significant advantage of the present invention is that the method does not depend on using any particular scanner configuration, but rather can be adapted to any configuration simply by constructing the databases in accordance with the scanner configuration. The databases can be provided either analytically based on mathematical models, or empirically by testing known defect types and sizes.
Furthermore, the invention is not negatively affected by the well-known phenomenon of xe2x80x9cdipsxe2x80x9d in the scatter distribution (i.e., scattered light intensity versus scattering angle or equivalent) that are produced by certain types of defects, particularly silicon particles. In fact, the presence of these dips in the data actually enhances the ability to identify defect type because not all defect types exhibit such dips, and those that do exhibit dips have different dip characteristics. At any rate, because of the dip phenomenon, the signal magnitude S from a given light collector, when plotted against particle diameter d, can yield more than one diameter for a given signal magnitude. This presents no problem with the present invention, however; for any S versus d relationship in the database that has more than one size parameter d for a given signal magnitude S, each size parameter d corresponding to the signal magnitude S is simply treated as if it were from an additional test configuration. Thus, a deviation parameter "sgr" is calculated based on each of the diameters and is compared to the other deviation parameters to determine the smallest deviation parameter.
The average diameter  less than d greater than  for each idealized defect type in the database is preferably calculated as follows:        less than      d      greater than     =                                          (                          S              1                        )                    α                ⁢                  d          1                    +                                    (                          S              2                        )                    α                ⁢                  d          2                    +      …      +                                    (                          S              N                        )                    α                ⁢                  d          N                                              (                      S            1                    )                α            +                        (                      S            2                    )                α            +      …      +                        (                      S            N                    )                α            
The parameter xcex1 is a constant. Using different values for a allows the average size to be calculated in different ways. If xcex1=0 is chosen, then the average size  less than d greater than  is just the simple arithmetic average; however, using xcex1=1 is a more preferred approach that has yielded better accuracy for a number of defect types.
The deviation parameter for each defect type preferably is calculated based on a summation of differences between each measured signal magnitude S and the corresponding determined signal magnitude  less than S greater than  that corresponds to the average defect size  less than d greater than . For instance, the deviation parameter "sgr" can be calculated based on the formula:   σ  =            [                                    "LeftBracketingBar"                          (                                                                     less than                                           S                      1                                         greater than                                                   -                                  S                  1                                            )                        "RightBracketingBar"                    β                +                              "LeftBracketingBar"                          (                                                                     less than                                           S                      2                                         greater than                                                   -                                  S                  2                                            )                        "RightBracketingBar"                    β                +        …        +                              "LeftBracketingBar"                          (                                                                     less than                                           S                      N                                         greater than                                                   -                                  S                  N                                            )                        "RightBracketingBar"                    β                    ]              1      β      
The parameter xcex2 is a constant. Using different values for xcex2 allows the deviation parameter to be found in different ways. If xcex2=1 is chosen, then the deviation parameter is the sum of the absolute values of the differences between the signals that were measured and the signals associated with the average size parameter  less than d greater than . A preferred value for xcex2, however, is 2, such that the deviation parameter is the square root of the sum of the squares of the differences. This approach has yielded better accuracy for a number of defect types. Of course, it will be recognized that there are other ways in which a deviation parameter could be calculated in accordance with the invention, and the invention is not limited to any particular formula.
Once the deviation parameters "sgr"1 to "sgr"M corresponding to the M different idealized defect types have been calculated, preferably, several of the smallest deviation parameters are identified and, for each of these deviation parameters, a relative probability value P is calculated, representing a relative probability that the defect in question belongs to the defect type associated with the particular deviation parameter. For example, if the three smallest deviation parameters are denoted "sgr"1, "sgr"2, and "sgr"3, a probability value for "sgr"1 is calculated based on the formula:       P    1    =                    (                  1          /                      σ            1                          )            δ                                (                      1            /                          σ              1                                )                δ            +                        (                      1            /                          σ              2                                )                δ            +                        (                      1            /                          σ              3                                )                δ            
Probability parameters P2 and P3 would be calculated by analogous formulas. The parameter xcex4 is a constant. Using different values for xcex4 allows the relative probability to be found in different ways. It has been found for many defect types that xcex4=1 works well. However, other values for xcex4 can be used instead, and indeed an entirely different formula could be used for deriving a probability parameter in accordance with the invention. Preferably, for each defect type corresponding to the three smallest deviation parameters, the defect type, probability value, and defect size are reported.
The invention also provides an apparatus for classifying defects in terms of type and size. The apparatus includes a storage medium storing the databases of signal magnitude versus size for each test configuration and defect type, a light collector system for collecting light that is scattered and/or reflected from the substrate and any defect thereon, and a computer connected with the light collector system and storage medium and operable to receive signals from the light collector system and access the databases to determine the defect type and size present on the substrate surface. Preferably, the apparatus includes a scanning system for effecting relative movement between the substrate being inspected and the incident light beam so that the beam is scanned over the entire substrate surface.