(This specification occasionally makes reference to prior published documents. A numbered list of these references can be found at the end of this section, under the sub-heading “References”.)
In integrated circuit manufacture, the accurate measurement of the microstructures being patterned onto semiconductor wafers is highly desirable. Optical measurement methods are typically used for high-speed, non-destructive measurement of such structures. With such methods, a small spot on a measurement sample is illuminated with optical radiation comprising one or more wavelengths, and the sample properties over the measurement spot are determined by measuring characteristics of radiation reflected or diffracted by the sample (e.g., reflection intensity, polarization state, or angular distribution).
This disclosure relates to the measurement of a sample comprising a diffractive structure formed on or in a substrate, wherein lateral material inhomogeneities in the structure give rise to optical diffraction effects. If the lateral inhomogeneities are periodic with a period significantly smaller than the illuminating wavelengths, then diffracted orders other than the zeroth order may all be evanescent and not directly observable, or may be scattered outside the detection instrument's field of view. But the lateral structure geometry can nevertheless significantly affect the zeroth-order reflectivity, making it possible to measure structure features much smaller than the illuminating wavelengths.
A variety of measurement methods applicable to diffractive structures are known in the prior art. Reference 7 reviews a number of these methods. The most straightforward approach is to use a rigorous, theoretical model based on Maxwell's equations to calculate a predicted optical signal characteristic of the sample (e.g. reflectivity) as a function of sample measurement parameters (e.g., film thickness, linewidth, etc.), and adjust the measurement parameters in the model to minimize the discrepancy between the theoretical and measured optical signal (Ref's 10, 14). (Note: In this context the singular term “characteristic” may denote a composite entity such as a vector or matrix. The components of the characteristic might, for example, represent reflectivities at different wavelengths or collection angles.) The measurement process comprises the following steps: First, a set of trial values of the measurement parameters is selected. Then, based on these values a computer-representable model of the measurement sample structure (including its optical materials and geometry) is constructed. The electromagnetic interaction between the sample structure and illuminating radiation is numerically simulated to calculate a predicted optical signal characteristic, which is compared to the measured signal characteristic. An automated fitting optimization algorithm iteratively adjusts the trial parameter values and repeats the above process to minimize the discrepancy between the measured and predicted signal characteristic. (The optimization algorithm might typically minimize the mean-square error of the signal characteristic components.)
The above process can provide very accurate measurement capability, but the computational burden of computing the structure geometry and applying electromagnetic simulation within the measurement optimization loop makes this method impractical for many real-time measurement applications. A variety of alternative approaches have been developed to avoid the computational bottleneck, but usually at the expense of compromised measurement performance.
One alternative approach is to replace the exact theoretical model with an approximate model that represents the optical signal characteristic as a linear function of measurement parameters over some limited parameter range. There are several variants of this approach, including Inverse Least Squares (ILS), Principal Component Regression (PCR), and Partial Least Squares (PLS) (Ref's 1–5, 7, 11, 15). The linear coefficients of the approximate model are determined by a multivariate statistical analysis technique that minimizes the mean-square error between exact and approximate data points in a “calibration” data set. (The calibration data may be generated either from empirical measurements or from exact theoretical modeling simulations. This is done prior to measurement, so the calibration process does not impact measurement time.) The various linear models (ILS, PCR, PLS) differ in the type of statistical analysis method employed.
There are two fundamental limitations of the linear models: First, the linear approximation can only be applied over a limited range of measurement parameter values; and second, within this range the approximate model does not generally provide an exact fit to the calibration data points. (If the calibration data is empirically determined, one may not want the model to exactly fit the data, because the data could be corrupted by experimental noise. But if the data is determined from a theoretical model it would be preferable to use an approximation model that at least fits the calibration data points.) These deficiencies can be partially remedied by using a non-linear (e.g., quadratic) functional approximation (Ref. 7). This approach mitigates, but does not eliminate, the limitations of linear models.
The parameter range limit of functional (linear or non-linear) approximation models can be extended by the method of “range splitting”, wherein the full parameter range is split into a number of subranges, and a different approximate model is used for each subrange (Ref. 7). The method is illustrated conceptually in FIG. 1 (cf. FIG. 2 in Ref. 7), which represents the relationship between a measurement parameter x, such as a linewidth parameter, and an optical signal characteristic y, such as the zeroth-order sample reflectivity at a particular collection angle and wavelength. (In practice one is interested in modeling the relationship between multiple measurement parameters, such as linewidths, film thicknesses, etc., and multiple signal components, such as reflectivities at different wavelengths or collection angles. However, the concepts illustrated in FIG. 1 are equally applicable to the more general case.) A set of calibration data points (e.g., point 101) is generated, either empirically or by theoretical modeling. The x parameter range is split into two (or more) subranges 102 and 103, and the set of calibration points is separated into corresponding subsets 104 and 105, depending on which subrange each point is in. A statistical analysis technique is applied to each subset to generate a separate approximation model (e.g., a linear model) for each subrange, such as linear model 106 for subrange 102 and model 107 for subrange 103.
Aside from the limitations inherent in the functional approximation models, the range-splitting method has additional deficiencies. Although the functional approximation is continuous and smooth within each subrange, it may exhibit discontinuities between subranges (such as discontinuity 108 in FIG. 1). These discontinuities can create numerical instabilities in optimization algorithms that estimate measurement parameters from optical signal data. The discontinuities can also be problematic for process monitoring and control because small changes in process conditions could result in large, discontinuous jumps in measurements.
Another drawback of the range-splitting model is the large number of required calibration points and the large amount of data that must be stored in the model. In the FIG. 1 illustration, each subrange uses a simple linear approximation model of the formy≅a x+b  Eq. 1wherein a and b are calibration coefficients. At least two calibration points per subrange are required to determine a and b (generally, more than two are used to provide good statistical sampling over each subrange), and two coefficients (a and b) must be stored for each subrange. If there are M subranges the total number of calibration points must be at least 2 M, and the number of calibration coefficients is 2 M. Considering a more general situation in which there are N measurement parameters x1, x2, . . . xN, the linear approximation would take the formy≅a1x1+a2x2+ . . . aNxN+b  Eq. 2If the range of each parameter is split into M subranges, the number of separate linear approximation models required to cover all combinations of parameter subranges would be MN, and the number of calibration parameters per combination (a1, a2, . . . , aN, b) would be N+1. Thus the total number of calibration coefficients (and the minimum required number of calibration data points) would be (N+1) MN. For example, FIG. 2 illustrates a parameter space spanned by two parameters, x1 and x2. The x1 range is split into three subranges 201, 202, and 203, and the x2 subrange is split into three subranges 204, 205, and 206. For this case, N=2, M=3, the number of x1 and x2 subrange combinations 207 . . . 215 is 32=9, and the number of linear calibration coefficients would be (2+1) 32=27. Generalizing further, if the optical signal characteristic (y) comprises multiple signal components (e.g., for different wavelengths), the number of calibration coefficients will increase in proportion to the number of components. Furthermore, if a nonlinear (e.g., quadratic) subrange model is used, the number of calibration points and coefficients would be vastly larger.
Another measurement approach, Minimum Mean Square Error analysis (MMSE, Ref's 2–9, 11, 13, 15), provides a simple alternative to the range splitting method described above. With this approach, a database of pre-computed theoretical optical signal characteristics representing a large variety of measurement structures is searched and compared to a samples' measured optical signal, and the best-fitting comparison (in terms of a mean-square-error fitting criterion) determines the measurement result. (The above-noted references relate primarily to scatterometry and spectroscopy, but MMSE-type techniques have also been applied in the context of ellipsometry; see Ref's. 12 and 16.) The MMSE method is capable of modeling strong nonlinearities in the optical signal. But this method, like range-splitting, can exhibit problematic discontinuities in the measurement results due to the database's discrete parameter sampling.
All of these prior-art methods entail a compromise between measurement resolution and accuracy. The MMSE approach is not limited by any assumed functional form of the optical signal, and can therefore have good accuracy. But measurement resolution is fundamentally limited by the parameter sampling density. The functional approximation models, by contrast, are capable of “interpolating” between calibration data points, in the sense that the modeled signal is a continuous and smooth function of measurement parameters across the calibration range; hence such models can have essentially unlimited measurement resolution. However, the term “interpolation” is a misnomer in this context because the functional models do not accurately fit the calibration data points, and their accuracy is limited by the misfit. (For example, Ref. 11 reports a fit accuracy of 5–10 nm for linewidth and thickness parameters.)