Manufacturing semiconductor devices involves depositing and patterning several layers overlaying each other. For example, gate interconnects and gates of an integrated circuit are formed at different lithography steps in the manufacturing process. The tolerance of alignment of these patterned layers is less than the width of the gate.
Overlay is defined as the displacement of a patterned layer from its ideal position aligned to a layer patterned earlier on the same wafer. Overlay is a two dimensional vector (Δx, Δy) in the plane of the wafer. Overlay is a vector field, i.e., the value of the vector depends on the position on the wafer. Perfect overlay and zero overlay are used synonymously. Overlay and overlay error are used synonymously. Depending on the context, overlay may signify a vector or one of the components of the vector.
Overlay metrology provides the information that is necessary to correct the alignment of the stepper-scanner and thereby minimize overlay error on subsequent wafers. Overlay errors, detected on a wafer after exposing and developing the photoresist, can be corrected by removing the photoresist and repeating the lithography step on a corrected stepper-scanner. If the measured error is minor, parameters for subsequent steps of the lithography process could be adjusted based on the overlay metrology to avoid excursions.
Most prior overlay metrology methods use metrology targets that are etched or otherwise formed into or on the various layers during the same plurality of lithography steps that form the patterns for circuit elements on the wafer. One typical pattern, called “box-in-box” consists of two concentric squares, formed on a lower and an upper layer, respectively. “Bar-in-bar” is a similar pattern with just the edges of the “boxes” demarcated, and broken into disjoint line segments (“Specification For Overlay-Metrology Test Patterns For Integrated-Circuit Manufacture”, specification SEMI P28-96, Semiconductor Equipment and Materials International, San Jose, Calif., 1996) The outer bars are associated with one layer and the inner bars with another. Typically one is the upper pattern and the other is the lower pattern, e.g., outer bars on a lower layer, and inner bars on the top. However, with advanced processes the topographies are complex and not truly planar so the designations “upper” and “lower” are ambiguous. Typically they correspond to earlier and later in the process. The squares or bars are formed by lithographic and other processes used to make planar structures, e.g., chemical-mechanical planarization (CMP). Currently, the patterns for the boxes or bars are stored on lithography masks and projected onto the wafer. Other methods for putting the patterns on the wafer are possible, e.g., direct electron beam writing from computer memory, and imprint lithography.
In one form of the prior art, a high performance microscope imaging system combined with image processing software estimates overlay error for the two layers. The image processing software uses the intensity of light at a multitude of pixels. Obtaining the overlay error accurately requires a high quality imaging system and means of focusing the system. One requirement for the optical system is very stable positioning of the optical system with respect to the sample. Relative vibration would blur the image and degrade the performance. This is a difficult requirement to meet for overlay metrology systems that are integrated into a process tool, like a lithography track. High-acceleration wafer handlers in the track cause vibration. The tight space requirements for integration do not favor bulky isolation strategies.
As disclosed in U.S. Patent Application Ser. No. 2002/0158193 (incorporated in this document by reference) one approach to overcoming these difficulties is to incorporate overlay metrology targets that comprise diffraction gratings within semiconductor wafers. The targets are measured using scatterometry to perform overlay metrology. Several different grating configurations are described for the overlay targets. The simplest embodiment uses two grating stacks, one for x-alignment and one for y (each grating stack comprising two grating layers, one in the lower layer, the other in the upper layer). An alternative embodiment uses two line grating stacks each for x and y (four grating stacks total). Still another embodiment uses three line grating stacks in combination to simultaneously measure both x and y alignment. (See also PCT publication WO 02/25723A2, incorporated herein by reference).
In FIG. 1A, one possible implementation for an overlay target is shown and generally designated 100. Target 100 includes two grating stacks labeled 102X and 102Y. Grating stack 102X is used to measure overlay in the x-direction while grating stack 102Y is used to measure overlay in the y-direction. Target 100 is typically included in an unused wafer portion (such as within a scribe line). This prevents overlay target 100 from interfering with devices included on the semiconductor wafer.
FIG. 1B shows the structural details of grating stack 102X (and, by analogy grating stack 102Y). As shown, grating stack 102X includes an upper grating 104U and a lower grating 104L. Gratings 104U and 104L have the same pitch 106 (in this document, period, spatial period, and pitch are used synonymously). Grating 104U is formed in an upper layer 108U and grating 104L is formed in a lower layer 108L. Upper and lower layers 108 may be separated by one or more intermediate layers 110.
To describe alignment between layers 108, FIG. 1B shows a symmetry plane 112U (for grating 104U and layer 108U) and symmetry plane 112L (for grating 104L and layer 108L). Symmetry plane 112U is offset from symmetry plane 112L by offset 114 (i.e., offset 114 is equal to x(112U)−x(112L)), the difference between the x-coordinates of the symmetry planes 112U and 112L. The value of offset 114 when the lithography is in perfect alignment is the offset bias of the grating stack 102X.
Offset bias is synonymously called reticle offset because it is produced by introducing an offset into the data that is written to the reticle set. Reticles are transparent, patterned plates. The pattern on the reticle is transferred to the wafer by lithography. The offset bias is produced by shifting the pattern of grating 104U in the reticle for the layer 108U with respect to the pattern of grating 104L in the reticle for the layer 108L, or vice versa.
An offset bias that is not an integer multiple of pitch/2 enables distinguishing the sign of the overlay. Symmetry planes 112 in FIG. 1B are not uniquely defined since there is one such symmetry plane for each line in gratings 104U and 104L. The magnitude of the offset bias is understood to be the least distance between any choice of symmetry plane 112U in grating 104U and any choice of symmetry plane 112L in grating 104L. For a stack of two overlaying line gratings, the best value for offset bias is equal to pitch/4 or −pitch/4. The term symmetric line grating is defined by the following property: The unit cell of a symmetric line grating can be selected in a way that renders the unit cell substantially invariant under reflection with respect to a plane that is perpendicular to the direction of the pitch. Small geometric imperfections, such as line edge roughness, that do not significantly affect optical measurements are not construed to break the symmetry.
Overlay measurements are obtained by measuring the optical responses of grating stacks 102X and 102Y, typically in sequence. The optical response can be measured by spectroscopic reflectometry, or spectroscopic ellipsometry, which do not spatially resolve the grating lines in grating stacks 102X and 102Y. Overlay measurements are then calculated from the optical measurements by regression.
In FIG. 2A, another possible implementation for an overlay target is shown and generally designated 200. Overlay target 200 includes two grating stacks for each direction in which overlay is to be measured. Grating stacks 202X and 202X′ are used for measurements in the x direction. Grating stacks 202Y and 202Y′ are used for measurements in the y direction. The use of two grating stacks per direction offers significantly more robust measurement of overlay when compared to the implementations of FIGS. 1A and 1B.
FIG. 2B shows the structural details of grating stacks 202X and 202X′ (and, by analogy grating stacks 202Y and 202Y′). As shown, grating stack 202X includes an upper grating 204U and a lower grating 204L. Grating stack 202X′ includes an upper grating 204U′ and a lower grating 204L′. Gratings 204U, 204L, 204U′ and 204L′ have the same pitch 106. Gratings 204U and 204U′ are formed in an upper layer 208U and gratings 204L and 204L′ are formed in a lower layer 208L. Upper and lower layers 208 may be separated by one or more intermediate layers 210. Patterned layers 208L and 208U may be formed on the same layer sequentially, in which case there are no intermediate layers 210. For example, both gratings may be etched at the zero-level on a silicon wafer to qualify a lithography projector. There may be zero or more layers between the substrate of the wafer and patterned layer 208L.
When layers 208U and 208L are in perfect alignment, grating stacks 202X and 202X′ are reflections of each other with respect to the x-axis. Grating stack 202X′ can be obtained from grating stack 202×by the following transformation: (x′, y′)=(c1−x,c2+y) where c1 and c2 are constant distances. Similarly, under perfect alignment, grating stacks 202Y and 202Y′ are related by reflection with respect to the y-axis. Grating stack 202Y′ can be obtained from grating stack 202Y by the following transformation: (x′,y′)=(c3+x,c4−y) where c3 and c4 are constant distances.
To describe alignment between layers 208, FIG. 2B shows two symmetry planes for grating stack 202×. These are labeled 212U (for upper grating 204U) and 212L (for lower grating 204L). FIG. 2B also shows two symmetry planes for grating stack 202X′. These are labeled 212U′ (for upper grating 204U′) and 212L′ (for lower grating 204L′). Offset 214 is x(212U)−x(212L). Offset 214′ is x(212U′)−x(212L′). At perfect alignment, the value of offset 214 is pitch/4 and the value of offset 214′ is −pitch/4. The value of offset 214 at perfect overlay is called the offset bias of grating stack 202X. Grating stack 202X and 202X′ then have the same optical properties when they are viewed by a polarization insensitive reflectometer. When the upper layer is shifted in the x-direction by an overlay Δx smaller than pitch/4 in magnitude, the magnitude of offset 214 becomes (pitch/4+Δx) and the magnitude of offset 214′ becomes (pitch/4−Δx). This breaks the reflection symmetry of grating stacks 202X and 202X′ and their optical responses differ. Optical measurements from grating stacks 202X and 202X′ are fitted simultaneously with a model of the grating stacks 202X and 202X′ to regress the offset Δx (Huang et al., “Scatterometry-Based Overlay Metrology,” Proc. SPIE Vol. 5038, p126–137, SPIE Bellingham, Wash., 2003):
                              min                      Δ            ⁢                                                  ⁢            x                          ⁢                              ∑            λ                                                          ⁢                      {                                                            [                                                            R                      ⁡                                              (                                                  λ                          ,                                                      Δ                            ⁢                                                                                                                  ⁢                            x                                                    ,                                                                                    Meas                              .                              at                                                        ⁢                                                                                                                  ⁢                                                          (                                                              202                                ⁢                                                                                                                                  ⁢                                X                                                            )                                                                                                      )                                                              -                                          R                      ⁡                                              (                                                  λ                          ,                                                      Δ                            ⁢                                                                                                                  ⁢                            x                                                    ,                                                      Model                            ⁢                                                                                                                  ⁢                                                          (                                                              202                                ⁢                                                                                                                                  ⁢                                X                                                            )                                                                                                      )                                                                              ]                                2                            +                                                [                                                            R                      ⁡                                              (                                                  λ                          ,                                                      Δ                            ⁢                                                                                                                  ⁢                            x                                                    ,                                                                                    Meas                              .                              at                                                        ⁢                                                                                                                  ⁢                                                          (                                                              202                                ⁢                                                                                                                                  ⁢                                                                  X                                  ′                                                                                            )                                                                                                      )                                                              -                                          R                      ⁡                                              (                                                  λ                          ,                                                                                                          ⁢                                                      Δ                            ⁢                                                                                                                  ⁢                            x                                                    ,                                                                                                          ⁢                                                      Model                            ⁢                                                                                                                  ⁢                                                          (                                                              202                                ⁢                                                                                                                                  ⁢                                                                  X                                  ′                                                                                            )                                                                                                      )                                                                              ]                                2                                      }                                              Eq        .                                  ⁢        1            
The summation in Eq. 1 is over wavelengths (λ) at which measurements are taken. In the model based regression, the offsets 214 and 214′ depend solely on the unknown overlay Δx. All other parameters, such as thicknesses of deposited layers, line widths and heights are common to the models of grating stacks 202X and 202X′ since the two grating stacks are next to each other and are subject to the same process conditions. The minimization above is with respect to Δx and other parameters of the model, such as thicknesses of layers are not shown in the equation for brevity. The quantity that is minimized may be a weighted sum of squares of the residual. Using two gratings with different offset biases doubles the number of measurements without adding any unknown parameters over what is used in the basic approach described in FIGS. 1A and 1B. Therefore, regression applied to measurements at two grating stacks with different offset biases yields a more robust estimate of the overlay. The offset in the y-direction, Δy, is found by a similar but separate regression applied to the measurements at grating stacks 202Y and 202Y′.
Another prior art (Huang et al., “Scatterometry-Based Overlay Metrology,” Proc. SPIE Vol. 5038, p126–137, SPIE Bellingham, Wash., 2003) uses a simple algorithm, called linear differential estimation, to obtain overlay from the measurements at 202X and 202X′. When overlay is zero, these targets appear identical to a normal-incidence unpolarized reflectometer because 202X′ is identical to 202X rotated by 180° in the plane of the wafer. An unpolarized, normal-incidence reflectometer is insensitive the angular orientation of the target in the plane of the wafer. When the overlay is nonzero, the reflection symmetry is broken and the optical properties of stacked gratings 202X and 202X′ differ (assuming nonzero offset bias). Stacked grating 202X at overlay=+Δx has reflection-symmetry with stacked grating 202X′ at overlay=−Δx:R(λ,Δx,at(202X))=R(λ,−Δx,at(202X′))  Eq. 2In Eq. 2, R (λ, Δx, at (202x′)) is the reflectance spectrum of stacked grating 202X′ when the value of overlay is Δx. When the overlay is small, the difference between the optical properties of the stacked gratings is proportional to the overlay:
                                                                        Δ                ⁢                                                                  ⁢                R                            =                            ⁢                                                R                  ⁡                                      (                                          λ                      ,                                              Δ                        ⁢                                                                                                  ⁢                        x                                            ,                                              at                        ⁢                                                                                                  ⁢                                                  (                                                      202                            ⁢                                                                                                                  ⁢                            X                                                    )                                                                                      )                                                  -                                  R                  ⁡                                      (                                          λ                      ,                                              Δ                        ⁢                                                                                                  ⁢                        x                                            ,                                                                                          ⁢                                              at                        ⁢                                                                                                  ⁢                                                  (                                                      202                            ⁢                                                                                                                  ⁢                                                          X                              ′                                                                                )                                                                                      )                                                                                                                          =                            ⁢                                                R                  ⁡                                      (                                          λ                      ,                                              Δ                        ⁢                                                                                                  ⁢                        x                                            ,                                                                                          ⁢                                              at                        ⁢                                                                                                  ⁢                                                  (                                                      202                            ⁢                                                                                                                  ⁢                            X                                                    )                                                                                      )                                                  -                                  R                  ⁡                                      (                                          λ                      ,                                                                        -                          Δ                                                ⁢                                                                                                  ⁢                        x                                            ,                                                                                          ⁢                                              at                        ⁢                                                                                                  ⁢                                                  (                                                      202                            ⁢                                                                                                                  ⁢                            X                                                    )                                                                                      )                                                                                                                          ≅                            ⁢                              2                ⁢                                                      ∂                                          R                      ⁡                                              (                                                  λ                          ,                                                      Δ                            ⁢                                                                                                                  ⁢                            x                                                    ,                                                      at                            ⁢                                                                                                                  ⁢                                                          (                                                              202                                ⁢                                                                                                                                  ⁢                                X                                                            )                                                                                                      )                                                                                                  ∂                                          (                                              Δ                        ⁢                                                                                                  ⁢                        x                                            )                                                                      ⁢                Δ                ⁢                                                                  ⁢                x                                                                        Eq        .                                  ⁢        3            
Maximum likelihood estimate of Δx based on the linear model (Eq. 3) yields a linear estimator for Δx of:
                                                                        Δ                ⁢                                                                  ⁢                                  x                  est                                            =                                                L                  T                                ⁢                Δ                ⁢                                                                  ⁢                                  R                  measured                                                                                                        =                                                ∑                  λ                                                                                        ⁢                                                      L                    ⁡                                          (                      λ                      )                                                        ⁡                                      [                                                                  R                        ⁡                                                  (                                                      λ                            ,                                                          Δ                              ⁢                                                                                                                          ⁢                              x                                                        ,                                                          at                              ⁢                                                                                                                          ⁢                                                              (                                                                  202                                  ⁢                                                                                                                                          ⁢                                  X                                                                )                                                                                                              )                                                                    -                                              R                        ⁡                                                  (                                                      λ                            ,                                                          Δ                              ⁢                                                                                                                          ⁢                              x                                                        ,                                                                                                                  ⁢                                                          at                              ⁢                                                                                                                          ⁢                                                              (                                                                  202                                  ⁢                                                                                                                                          ⁢                                                                      X                                    ′                                                                                                  )                                                                                                              )                                                                                      ]                                                                                                          Eq        .                                  ⁢        4            
The spectrum L(λ) called the estimator, is obtained before measurements are made. L(λ) is obtained from either measured or calculated differential optical responses (ΔR) of the grating stacks 202X and 202X′. L(λ) is preferably obtained from measurements on multiple pairs of grating stacks 202X and 202X′, each with a different known offset written to the reticle. L(λ) is then obtained by solving a linear least squares problem.
Prior Art: Disadvantage of Obtaning Overlay by Fitting the Measurement with a Rigorous Model of Diffraction
The approach described by Eq. 1 (fitting the optical response of grating stacks by a rigorous model of electromagnetic wave scattering) presents a practical difficulty. The optical index of refraction, a complex number, must be known at all measurement wavelengths for all materials that make up the metrology target. Optical properties of materials are typically measured on uniform film samples that are incrementally deposited on blank wafers. Preparing such samples and measuring their refractive indices as a function of wavelength is time consuming. The optical properties of some blanket films can differ from the properties of the same materials deposited during the actual manufacturing process.
A second difficulty is that the geometric model of the profiles of the lower and upper gratings may fail to represent the actual sample. The geometric model has adjustable parameters. Varying the adjustable parameters spans a set of profiles. However, the actual profile can be outside the set spanned by varying model parameters if the profile has features that are not anticipated by the user. In that case, recovery involves imaging the cross section of the sample by transmission or reflection scanning electron microscopy (SEM), which is a destructive and time-consuming process. The parameterization of the geometric model is changed accordingly until the model predicts the optical response of the grating stack.
Determining the optical properties and a proper parameterization of the geometric model is a significant setup effort that needs to be completed before the measurements can start. This is a disadvantage compared to the prior art that is based on processing images of targets such as bar-in-bar targets.
Prior Art: Disadvantages of Obtaining Overlay by Linear Differential Estimation
The linear estimation method described in Eqs. 3–4 has a weakness. The coefficient of the term that is linear in Δx, namely 2∂R(λ,Δx,at(202X))/∂(Δx), is not a constant spectrum. It depends on thicknesses of layers and profiles of grating lines. Therefore, L(λ) in Equation 4 is valid for a narrow range of process parameters such as layer thicknesses. If the process deviates more than 5% either from batch to batch or across a wafer, then Eq. 4 can give erroneous estimates of overlay.
A second difficulty with the linear differential estimator described in Eq. 3–4 is that obtaining (training) L(λ) by actual measurements requires multiple targets, each with a known offset written to the reticle. Providing such targets presents a logistics problem. Multiple pairs of grating stacks take prohibitively large area to provide them at each measurement site. Providing them in one place on the wafer may not sufficiently address thickness and line width variations on a wafer and reduces the efficiency of the lithography process. Providing them on a sacrificial wafer does not address wafer-to-wafer variations and reduces the efficiency of production.
Prior Art: Disadvantage of Large Measurement Time and Footprint of Target
Spectroscopic reflectometers and ellipsometers have relatively small (on the order of 0.1) numerical apertures. Otherwise, spectral features of the sample would loose their contrast and sharpness. Consequently, the measurements spot, i.e., spatial resolution, of such instruments is on the order of 40 μm. Therefore, each of the grating stacks 202X, 202X′, 202Y, and 202Y′ in FIG. 2 must have at least a 40 μm by 40 μm footprint on the wafer. A spectroscopic ellipsometer or reflectometer would have to measure grating stacks 202X, 202X′, 202Y, and 202Y′ sequentially. Therefore, the scatterometry-based prior art requires at least four times more area on the wafer and four times more measurement time compared to the imaging-based prior art.
Prior Art: The Color-Box Technique
Heimann (“The Color-Box alignment vernier: a sensitive lithographic alignment vernier read at low magnification,” Optical Engineering, July 1990, Vol. 29, No. 7, p. 828–836) describes an overlay metrology target that consists of a total of 26 grating stacks (13 grating stacks for each of x and y directions), Heimann used triply redundant grating stacks, (a total of 78) each grating stack occupying a 20 μm by 20 μm area on the wafer. Each grating stack has a different offset written to the reticle, changing in increments of Pitch/16. A low-magnification (5×) microscope objective is used to image the grating stacks without resolving the grating lines. Each grating stack appears to have a uniform color, hence Heimann calls the grating stacks color boxes. The color depends on the offset between the upper and lower gratings in the stack. Overlay is determined by finding the color box around which the colors of the neighboring boxes are symmetrically distributed. This technique does not involve any diffraction computation and offers large depth of focus. The optics required for the color-box measurement are of lower cost and more robust compared to the optics required for the imaging-based prior art that uses bar-in-bar targets.