Manufacturing semiconductor devices involves depositing and patterning several layers overlaying each other. For example, gate interconnects and gates of an integrated circuit are formed at different lithography steps in the manufacturing process. The tolerance of alignment of these patterned layers is less than the width of the gate.
Overlay is defined as the displacement of a patterned layer from its ideal position aligned to a layer patterned earlier on the same wafer. Overlay is a two dimensional vector (Δx, Δy) in the plane of the wafer. Overlay is a vector field, i.e., the value of the vector depends on the position on the wafer. Perfect overlay and zero overlay are used synonymously. Overlay and overlay error are used synonymously. Depending on the context, overlay may signify a vector or one of the components of the vector.
Overlay metrology provides the information that is necessary to correct the alignment of the stepper-scanner and thereby minimize overlay error on subsequent wafers. Overlay errors, detected on a wafer after exposing and developing the photoresist, can be corrected by removing the photoresist and repeating the lithography step on a corrected stepper-scanner. If the measured error is minor, parameters for subsequent steps of the lithography process could be adjusted based on the overlay metrology to avoid excursions.
Most prior overlay metrology methods use built-in test patterns etched or otherwise formed into or on the various layers during the same plurality of lithography steps that form the patterns for circuit elements on the wafer. One typical pattern, called “box-in-box” consists of two concentric squares, formed on a lower and an upper layer, respectively. “Bar-in-bar” is a similar pattern with just the edges of the “boxes” demarcated, and broken into disjoint line segments. The outer bars are associated with one layer and the inner bars with another. Typically one is the upper pattern and the other is the lower pattern, e.g., outer bars on a lower layer, and inner bars on the top. However, with advanced processes the topographies are complex and not truly planar so the designations “upper” and “lower” are ambiguous. Typically they correspond to earlier and later in the process. The squares or bars are formed by lithographic and other processes used to make planar structures, e.g., chemical-mechanical planarization (CMP). Currently, the patterns for the boxes or bars are stored on lithography masks and projected onto the wafer. Other methods for putting the patterns on the wafer are possible, e.g., direct electron beam writing from computer memory.
In one form of the prior art, a high performance microscope imaging system combined with image processing software estimates overlay error for the two layers. The image processing software uses the intensity of light at a multitude of pixels. Obtaining the overlay error accurately requires a high quality imaging system and means of focusing the system. One requirement for the optical system is very stable positioning of the optical system with respect to the sample. Relative vibration would blur the image and degrade the performance. This is a difficult requirement to meet for overlay metrology systems that are integrated into a process tool, like a lithography track. High-acceleration wafer handlers in the track cause vibration. The tight space requirements for integration preclude bulky isolation strategies.
As disclosed in U.S. Patent Application Serial No. 2002/0158193 (incorporated in this document by reference) one approach to overcoming these difficulties is to incorporate special diffraction gratings, known as targets, within semiconductor wafers. The targets are measured using scatterometry to perform overlay metrology. Several different grating configurations are described for the overlay targets. The simplest embodiment uses two grating stacks, one for x-alignment and one for y (each grating stack comprising two grating layers). An alternative embodiment uses two line grating stacks each for x and y (four grating stacks total). Still another embodiment uses three line grating stacks in combination to simultaneously measure both x and y alignment. (See also PCT publication WO 02/25723A2, incorporated herein by reference).
In FIG. 1A, one possible implementation for an overlay target is shown and generally designated 100. Target 100 includes two test patterns labeled 102X and 102Y. Test pattern 102X is used to measure displacement in the x-direction while test pattern 102Y is used to measure displacement in the y-direction. Target 100 is typically included in an unused wafer portion (such as within a scribe line). This prevents overlay target 100 from interfering with devices included on the semiconductor wafer.
FIG. 1B shows the structural details of test pattern 102X (and, by analogy test pattern 102Y). Each test pattern is a stack of gratings. As shown, test pattern 102X includes an upper grating 104U and a lower grating 104L. Gratings 104U and 104L have the same pitch 106 (in this document, period, spatial period, and pitch are used synonymously). Grating 104U is formed in an upper layer 108U and grating 104L is formed in a lower layer 108L. Upper and lower layers 108 may be separated by one or more intermediate layers 110.
To describe alignment between layers 108, FIG. 1B shows a symmetry plane 112U (for grating 104U and layer 108U) and symmetry plane 112L (for grating 104L and layer 108L). Symmetry plane 112U is offset from symmetry plane 112L by offset 114 (i.e., offset 114 is equal to x(112U)−x(112L)), the difference between the x-coordinates of the symmetry planes 112U and 112L. The value of offset 114 when the lithography is in perfect alignment is the offset bias of the grating stack 102X. An offset bias that is not zero or any other integer multiple of pitch/2 enables distinguishing the sign of the overlay. Symmetry planes 112 in FIG. 1B are not uniquely defined since there is one such symmetry plane for each line in grating 104U and 104L. The magnitude of the offset bias is understood to be the least distance between any choice of symmetry plane 112U in grating 104U and any choice of symmetry plane 112L in grating 104L. For a test pattern that consists of two stacked (overlaying) symmetric line gratings, the best value for offset bias is equal to pitch/4. The term symmetric line grating is defined by the following property: The unit cell of a symmetric line grating can be selected in a way that renders the unit cell substantially invariant under reflection with respect to a plane that is perpendicular to the direction of the pitch. Small geometric imperfections, such as line edge roughness, that do not significantly affect optical measurements are not construed to break the symmetry.
Overlay measurements are obtained by measuring the optical responses of test patterns 102X and 102Y, typically in sequence. The optical response can be measured by spectroscopic reflectometry, or spectroscopic ellipsometry, which do not spatially resolve the grating lines in test patterns 102X and 102Y. Overlay measurements are then calculated from the optical measurements by regression.
In FIG. 2A, another possible implementation for an overlay target is shown and generally designated 200. Overlay target 200 includes two test patterns for each direction in which overlay is to be measured. Test patterns 202X and 202X′ are used for measurements in the x direction. Test patterns 202Y and 202Y′ are used for measurements in the y direction. As will be shown, the use of two test patterns per direction offers significantly more robust measurement of overlay when compared to the implementations of FIGS. 1A and 1B.
FIG. 2B shows the structural details of test patterns 202X and 202X′ (and, by analogy test patterns 202Y and 202Y′). As shown, test pattern 202X includes an upper grating 204U and a lower grating 204L. Test pattern 202X′ includes an upper grating 204U′ and a lower grating 204L′. Gratings 204U, 204L, 204U′ and 204L′ have the same pitch. Gratings 204U and 204U′are formed in an upper layer 208U and gratings 204L and 204L′ are formed in a lower layer 208L. Upper and lower layers 208 may be separated by one or more intermediate layers 210. Patterned layers 208L and 208U may be formed on the same layer sequentially, in which case there are no intermediate layers 210. For example, both gratings may be etched at the zero-level on a silicon wafer to qualify a lithography projector. There may be zero or more layers between the substrate of the wafer and patterned layer 208L.
When layers 208U and 208L are in perfect alignment, test patterns 202X and 202X′ are reflections of each other with respect to the x-axis. Test pattern 202X′ can be obtained from test pattern 202X by the following transformation: (x′,y′)=(c1−x,c2+y) where c1 and c2 are constant distances. Similarly, under perfect alignment, test patterns 202Y and 202Y′ are related by reflection with respect to the y-axis. Test pattern 202Y′ can be obtained from test pattern 202Y by the following transformation: (x′,y′)=(c3+x,c4−y) where c3 and c4 are constant distances.
To describe alignment between layers 208, FIG. 2B shows two symmetry planes for test pattern 202X. These are labeled 212U (for upper grating 204U) and 212L (for lower grating 204L). FIG. 2B also shows two symmetry planes for test pattern 202X′. These are labeled 212U′ (for upper grating 204U′) and 212L′ (for lower grating 204L′). Offset 214 is x(212U)−x(212L). Offset 214′ is x(212U′)−x(212L′). At perfect alignment, the value of offset 214 is pitch/4 and the value of offset 214′ is −pitch/4. The value of offset 214 at perfect overlay is called the offset bias of grating stack (test pattern) 202X. Test pattern 202X and 202X′ then have the same optical properties when they are viewed by a polarization insensitive reflectometer. When the upper layer is shifted in the x-direction by an overlay Δx smaller than pitch/4 in magnitude, the magnitude of offset 214 becomes (pitch/4+Δx) and the magnitude of offset 214′ becomes (pitch/4−Δx). This breaks the reflection symmetry of test patterns 202X and 202X′ and their optical responses differ. The difference in the optical responses, such as difference of reflectance spectra, R(λ, 202X)−R(λ, 202X′), is proportional to Δx for small offsets (where λ denotes wavelength). Offset Δx can be estimated from the difference spectra with a simple linear operator. Alternatively, the optical measurements from test patterns 202X and 202X′ are fitted simultaneously with a model of the test patterns 202X and 202X′ to regress the offset Δx:
                                                                        min                                  Δ                  ⁢                                                                          ⁢                  x                                            ⁢                                                ∑                  λ                                ⁢                                  {                                      [                                                                  R                        ⁡                                                  (                                                      λ                            ,                                                          Δ                              ⁢                                                                                                                          ⁢                              x                                                        ,                                                                                          Meas                                .                                at                                                            ⁢                                                                                                                          ⁢                                                              (                                                                  202                                  ⁢                                  X                                                                )                                                                                                              )                                                                    -                                                                                                                                                                                                    R                    ⁡                                          (                                              λ                        ,                                                  Δ                          ⁢                                                                                                          ⁢                          x                                                ,                                                  Model                          ⁢                                                                                                          ⁢                                                      (                                                          202                              ⁢                              X                                                        )                                                                                              )                                                        ]                                2                            +                                                                                                            [                                                            R                      ⁡                                              (                                                  λ                          ,                                                      Δ                            ⁢                                                                                                                  ⁢                            x                                                    ,                                                                                    Meas                              .                              at                                                        ⁢                                                                                                                  ⁢                                                          (                                                              202                                ⁢                                                                  X                                  ′                                                                                            )                                                                                                      )                                                              -                                          R                      ⁡                                              (                                                  λ                          ,                                                      Δ                            ⁢                                                                                                                  ⁢                            x                                                    ,                                                      Model                            ⁢                                                                                                                  ⁢                                                          (                                                              202                                ⁢                                                                  X                                  ′                                                                                            )                                                                                                      )                                                                              ]                                2                            }                                                          Eq        .                                  ⁢        1            
In the model based regression, the offsets 214 and 214′ depend solely on the unknown overlay Δx. All other parameters, such as thicknesses of deposited layers, line widths and heights are common to the models of test pattern 202X and 202X′ since the two test patterns are next to each other and are subject to the same process conditions. The minimization above is with respect to Δx and other parameters of the model, such as thicknesses of layers, which are not shown in the equation for brevity. The quantity that is minimized may be a weighted sum of squares or any other norm of the residual. Using two gratings with different offset biases doubles the number of measurements without adding any unknown parameters over what is used in the basic approach described in FIGS. 1A and 1B. Therefore, regression applied to measurements at two grating stacks with different offset biases yields a more robust estimate of the overlay. The offset in the y-direction, Δy, is found by a similar but separate regression applied to the measurements at test patterns 202Y and 202Y′.
Simultaneously regressing measurements at two grating stacks, where the offset biases of the gratings stacks differ by pitch/2, shares two limitations of the basic approach described in FIGS. 1A and 1B. The first limitation is the range of unambiguous offset measurements. Both approaches give ambiguous results when overlay exceeds ±pitch/4 for symmetric line gratings. FIG. 3a shows the test pattern 202X and 202X′ when overlay is Δx=−pitch/4. In this case offset 214 is zero and offset 214′ is −pitch/2. FIG. 3b shows the test pattern 202X and 202X′ when overlay is Δx=+pitch/4 and the offset 214 is pitch/2 and offset 214′ is zero. Let R(λ, Δx) denote the optical response of test pattern 202X when the upper test pattern layer is displaced from perfect alignment by Δx in the x-direction. By symmetry:R(λ,[pitch/4]+Δx)=R(λ,[pitch/4]−Δx) R(Δ,−[pitch/4]+Δx)=R(λ,−[pitch/4]−Δx)  Eq. 2
This limits the measurement range to half a period of the grating stack. The second limitation of the prior art follows from the two equations above: The sensitivity of the optical properties to overlay is zero when Δx=±pitch/4:
                                                        ∂              R                                      ∂                              (                                  Δ                  ⁢                                                                          ⁢                  x                                )                                              ⁢                      (                          λ              ,                                                ±                  pitch                                /                4                                      )                          =        0                            Eq        .                                  ⁢        3            
FIG. 4 shows the computed reflectance spectra for a particular test pattern 202X as a function of overlay (Δx) for four different wavelengths. At each wavelength, the partial derivative of reflectance with respect to overlay is zero when the overlay is ±pitch/4, as indicated by vertical dashed lines in FIG. 4. Test pattern 202X and 202X′ and their combination have dead-zones in the vicinity of overlay=±pitch/4.
FIG. 5 shows the results of the regression applied to an actual measurement. The horizontal axis is the known overlay and the vertical axis is the overlay estimated by scatterometry using a pair of test pattern stacks for each direction. The measurement breaks down in a neighborhood of the dead zones Δx=±pitch/4. When the actual overlay is between pitch/4 and pitch/2, the estimated offset becomes (pitch/2)−(actual overlay).
Prior art teaches that this limitation can be avoided by making the grating layers asymmetric, for example by having two lines of distinct widths and two spaces of distinct widths in the unit cell (one period) of the grating layer. Using asymmetric lines increases the number of unknown parameters of the model since the widths of the two lines can change independently according to process variations. This increases the computational burden and makes the measurement less robust.