Atomic layer deposition (ALD) can be characterized as a variant of chemical vapor deposition (CVD) wherein a wafer substrate surface is sequentially exposed to reactive chemical precursors and each precursor pulse is separated from a next, subsequent precursor pulse by an inert purge gas period. Many descriptions of ALD processes and procedures (wherein various reactive precursor chemistries and both thermal and plasma assisted ALD approaches are used) exist. See, e.g., T. Suntola, Material Science Reports, v. 4, no. 7, p. 266 et seq. (December 1989); M. Ritala & M. Leskela, “Deposition and Processing of Thin Films” in Handbook of Thin Film Materials, v. 1 ch. 2, (2002); J. W. Klaus et al., “Atomic Layer Deposition of Tungsten Using Sequential Surface Chemistry with a Sacrificial Stripping Reaction”, Thin Solid Films, v. 360, pp. 145-153 (2000); S. Imai & M. Matsumura, “Hydrogen atom assisted ALE of silicon,” Appl. Surf. Sci., v. 82/83, pp. 322-326 (1994); S. M. George et al., “Atomic layer controlled deposition of SiO2 and Al2O3 using ABAB . . . binary reactions sequence chemistry”, Appl. Surf. Sci., v. 82/83, pp. 460-467 (1994); M. A. Tischler & S. M. Bedair, “Self-limiting mechanism in the atomic layer epitaxy of GaAs”, Appl. Phys. Lett., 48(24), p. 1681 (1986). Several commercial applications of ALD technology, such as the deposition of Al2O3 for advanced DRAM capacitors, have been reported (see M. Gutsche et al., “Capacitance Enhancements techniques for sub 100 nm trench DRAMs, IEDM 2001, p. 411 (2001)); and there are also many descriptions of ALD reactor architectures in the patent literature. See, e.g., U.S. Pat. Nos. 4,389,973; 5,281,274; 5,855,675; 5,879,459; 6,042,652; 6,174,377; 6,387,185; and 6,503,330. In general, both single wafer and batch reactors are used, and plasma capabilities accompany some embodiments.
The ALD process has many advantages over conventional CVD and PVD (physical vapor deposition) methods to produce thin films in that it can provide much higher film quality and incomparably good step coverage. Therefore it is expected that the ALD process will becomes an important technique for use in the fabrication of next-generation semiconductor devices. However, ALD's low wafer throughput has always been an obstacle to its widespread adoption in industry. For example, as the typical cycle times are on the order of 3-6 sec/cycle, typical film growth rates are on the order of 10-20 Å/min (the film deposition rate (FDR) is given by the product of the ALD deposition rate (Å/cycle) and the reciprocal of the cycle time (cycles/unit time)). Thus a 50 Å thick film can be deposited with a throughput of only up to approximately 15 wafers per hour in a single-wafer ALD reactor.
Most attempts to improve the throughput of ALD processes have involved process controls to rapidly switch between exposure and purge with computer controlled electrically driven pneumatic valves providing precursors pulsed with precision of 10 s of milliseconds. Others have tried to improve throughput using shorter precursor pulsing and purge times as well as different process temperatures and pressures. It is also recommended that reactor volumes be “small”, to facilitate precursor purging, and employ heated walls, to avoid the undesired retention of precursors, such as water or ammonia, through the ALD cycle (see Ritala & Leskela, supra). However, with respect to the basic ALD process sequence, the alternative pulsing and purging steps have not materially changed, and no substantial throughput improvements using the above methods have been reported.
Attempts to increase the film deposition rate within the context of conventionally practiced ALD are limited by the practice of long purges to achieve desired ALD film performance. To understand why this is so, consider that the heart of the ALD technology is the self-limiting and self-passivating nature of each precursor's reactions on the heated wafer substrate surface. In the ideal case, each self-limiting chemical half-reaction (e.g., for metal and non-metal reactions) progresses towards a saturated deposition thickness per ALD cycle and follows exponential or Langmuir kinetics. An ALD cycle is the sum of the periods of exposure of the wafer substrate to each precursor and the purge period times to remove excess precursors and reaction byproducts after each such exposure. Suntola's seminal patent (U.S. Pat. No. 4,389,973), described the diffusive nature of the pulsed chemical precursors. The broadening of the precursor pulse through gaseous diffusion places a fundamental limit on how short the interval between pulses can be in order to avoid the occurrence of undesirable CVD reactions. When more diffusive conditions are exhibited in the ALD apparatus, longer purge intervals are required to maintain a desired precursor pulse separation during the ALD cycle to achieve near ideal ALD film growth. Furthermore, an initiation process is key to a continuous startup of the overall ALD process. For example, surface preparation can be carried out to achieve saturation of the Si wafer surface with hydroxyl groups: Si—OH.
The self-limiting reactions of the ALD process yield a deposition rate (e.g., as measured in Å/cycle) that is observed to increase as a function of exposure dose (or time for a given precursor flux) until it reaches saturation. Saturation is characterized by the onset of the absence of further increase of the ALD growth rate with further increase of the precursor exposure dose. For some precursors, such as H2O and NH3, saturation is characterized by the onset of a substantially slower increase of the ALD growth rate with further increase of the precursor exposure dose. This behavior is frequently referred to as “soft saturation”. We refer to the ALD deposition rate (in Å/cycle) as a maximum saturated ALD deposition rate when both precursor exposure doses are sufficient to achieve saturation for both precursors.
Conventional ALD operation is typically carried out at the maximum saturated ALD deposition rate. Further, conventional ALD operation allows for and encourages “over-dosing” of both chemical precursors so that exposure time to the precursor dose during each precursor pulse is more than enough in order to ensure saturation of that precursor's half-reaction for all regions of the substrate. This conventional approach has been the practice of record for ALD technology since 1977 and is often cited, for example in review articles by Ritala & Leskela, supra, and Sneh (O, Sneh, et. al., “Equipment for Atomic Layer Deposition and Applications for Semiconductor Processing,” Thin Solid Films, v. 402/1-2, pp. 248-261 (2002)). In this overdosed ALD method, gas dynamics and kinetics play a minor role, (see id., indicating that self-limiting growth ensures precursor fluxes do not need to be uniform over the substrate) and saturation is eventually obtained for all points on the substrate.
The current ALD practice of over-dosage is an inherently inefficient process and puts many limitations on the optimal performance of commercial ALD systems. For example, in the overdose approach the chemical precursor dose in some regions of a substrate continue to be applied even though the film has already reached saturation in that location, because saturation has not yet been achieved in other areas. This results in the waste of the excess precursor, adding cost for chemical usage. Additionally, the purge part of the ALD cycle is burdened with removing more than the necessary amount of precursor left in the reactor for global film coverage. The excess, unreacted precursors can then react in areas of the ALD apparatus located downstream from the wafer surface, such as the pumping conduits and the pump, resulting in undesirable deposition on these components, and increasing the need for cleaning. In some cases, this type of undesired deposition outside the reactor chamber can even cause component failure.
Clearly, the more overdosed the precursors are, the more detrimental these effects can be on the ALD apparatus performance. This contributes to extended equipment downtime for maintenance, which is unacceptable in production environment. Furthermore, the additional time used to globally cover the substrate while overdosing the first exposed regions will add to the diffusion broadening of the precursor pulses, further increasing the interval of purges to reach some useful minimal co-existence of concentrations of precursors in the gas phase. This, in turn, leads to increased time to complete each ALD cycle, and thus lowers the film deposition rate and wafer throughput.