1. Field of the Invention
This invention relates to improvements in semiconductor non-volatile memory transistors, and more specifically improvements in the gate design and processing of non-volatile transistors used in electrically erasable, electrically programmable read-only memories.
2. Description of the Related Art
Until recently, the gate structure of non-volatile memory transistors has been designed in a similar manner to that of conventional CMOS insulted gate field effect transistors (IGFETs) or MOSFETs. The difference between CMOS IGFETs and non-volatile IGFETs is primarily that non-volatile IGFETs include an added charge storage layer embedded in the gate dielectric. The charge storage layer is either a conductive element, such as a polycrystalline silicon (poly) floating gate, or a non-conductive element such as a dielectric which is capable of trapping charge. Older types of CMOS transistors have typically used a heavily doped N-type gate material for both N- and P-channel transistors in order to simplify processing and to achieve low poly resistivity. With the advent of deep sub-micron CMOS technology, a greater emphasis has been placed on reduced temperature processing, deeply scaled transistor geometries, and silicided polycrystalline silicon gates. This emphasis has led to changes in the gate structure that affect the doping of the poly.
In older technologies, the poly was typically doped by furnace diffusion processes using POCl.sub.3 or Phosphine gas to produce a heavily doped N-type material. In newer technologies with channel length geometries at 0.7 microns and below, the furnace diffusion doping processes have been replaced with ion-implantation or low temperature in-situ doping during the poly deposition. These newer doping methods, which allow for substantially reduced thermal processing while doping the poly gate, are necessary to produce deeply scaled transistor geometries. Further, these newer doping methods allow for better doping control in the poly which is useful in facilitating the formation of a metal-silicide layer on top of the poly. Also, in newer technologies, it has been advantageous to use both N- and P-type doped poly, rather than simply N-type poly. Using P-type poly allows deeply scaled P-type MOS transistors to operate more efficiently at lower channel lengths due to the elimination of a buried channel that is usually required with N-type poly. Thus, N-type poly gates are often used in today's N-channel MOSFETs and P-type poly gates are often used in today's advanced P-channel MOSFETs. In these modern devices, the gate doping type is matched to the source and drain junction doping type.
Until recently, there has been no advantage in using different criteria for choosing a gate doping type for non-volatile memory devices from those used to choose the doping type for conventional MOSFETS. The choice has been primarily motivated by a desire to save costs by being compatible with processes used to produce conventional MOSFET devices. As a result, more recently developed doping methods and doping types for conventional MOSFETs have been applied to the construction of non-volatile memory transistors. Specifically, advanced N-channel non-volatile memory transistors are constructed using an N-type poly gate, advanced P-channel transistors are constructed using a P-type poly gate, and doping levels in both are often lower than what was used in the past.
In FIG. 1 memory transistor 10 shows an N-channel non-volatile insulated gate field effect transistor which includes a charge storage layer 32 embedded in its gate dielectric, according to prior art. The charge storage layer 32 is typically surrounded by at least a top dielectric 31 and a bottom dielectric 33 and resides between the N-type gate 12 and the channel 15 of the transistor. Channel 15 resides in the P-type silicon bulk 11 between the N-type source 14 and N-type drain 16 regions. The charge storage layer 32 is either a "floating gate", typically of doped polycrystalline silicon, or a dielectric material capable of trapping charge carriers such as silicon nitride, silicon oxynitride, silicon-rich silicon dioxide, or a ferroelectric material. The thickness of the gate dielectric, the composite of layers 31, 32 and 33, is typically dielectrically equivalent to 150 .ANG. to 200 .ANG. of silicon dioxide, although thinner dielectrics are currently under investigation. Note that transistor 10 could optionally include a silicide layer on top of the N-type gate 12.
In FIG. 2 memory transistor 10' shows a P-channel non-volatile insulated gate field effect transistor which includes a charge storage layer 32 embedded in its gate dielectric, according to prior art. The charge storage layer 32 is typically surrounded by at least a top dielectric 31 and a bottom dielectric 33 and resides between the P-type gate 12' and the channel 15' of the transistor. Channel 15' resides in the N-type silicon bulk 11' between the P-type source 14' and P-type drain 16' regions. The charge storage layer 32 is either a "floating gate", typically of doped polycrystalline silicon, or a dielectric material capable of trapping charge carriers such as silicon nitride, silicon oxynitride, silicon-rich silicon dioxide, or a ferroelectric material. The thickness of the gate dielectric, the composite of layers 31, 32 and 33, is typically dielectrically equivalent to 150 .ANG. to 200 .ANG. of silicon dioxide, although thinner dielectrics are currently under investigation. Note that transistor 10' could optionally include a silicide layer on top of the P-type gate 12'.
The amount and polarity of charge residing in the charge storage layer 32 affect the conductivity of the non-volatile transistor. The words "programmed" and "erased" are used here to describe two possible conductivity states that non-volatile transistors can achieve under two different charge storage conditions. It is recognized that the designation of the words "programmed" and "erased" is purely arbitrary and that these terms can be selected to represent different meanings depending on the application. Here, however, the terms "erased" and "programmed" are used in reference to relative levels of conductance. The terms "erased" or "erase", and "programmed" or "program" are used to describe the "on" and "off" states, respectively. The primary difference between these two states is the level of conductance in non-volatile transistor while under read biases. An "on" state results when the non-volatile transistor is conductive and an "off" state results when the non-volatile transistor is non-conductive, or at least less conductive than a predetermined range of conductance that represents the "on" state. Further, the term "write" is used to describe an operation that intentionally sets the threshold voltage of a non-volatile memory transistor, either to the erase state or to the program state.
Unfortunately, we have discovered that matching the doping type of the gate to that of the source and drain junctions is not-necessarily the optimal choice for building modern non-volatile memory transistor. So effects of using opposite gate and junction doping in non-volatile memory transistors are now being explored. The problem is that the traditional choice can lead to slow program timing and can reduce the scalability of a non-volatile transistor. These problems have not been a factor in devices that have been in production to date. However, as non-volatile device channel length geometries scale to 0.7 micron and below where the effective gate dielectric thickness is 170 .ANG. or less the effects of gate doping become critical to the operation of the non-volatile transistor, as discussed below.
Non-volatile memory transistors oftentimes are written by placing a relatively high voltage on the gate with respect to the transistor channel. For example, a large negative potential (-10 to -20 volts) is placed on gate 12 relative to the channel in order to erase transistor 10 by way of quantum mechanical tunneling. Likewise a large positive potential (+10 to +20 volts) is placed on the gate 12 relative to the channel in order to program transistor 10, again using quantum mechanical tunneling. Similar voltage magnitudes, but opposite polarities, are applied to the gate 12' to erase and program transistor 10'. The large magnitude of the applied voltage is needed in order to both shorten tunneling distances and to lower tunneling barriers. This bias method enables charging the charge storage layer in a reasonable amount of time, typically within hundreds of microseconds to seconds.
The tunneling charge transport in transistors 10 and 10' is described by way of equations known in the industry as Fowler-Nordheim Tunneling, Modified Fowler-Nordheim Tunneling, Direct Tunneling and Trap-Assisted Tunneling. These equations accurately predict that the rate of tunneling charge transport, or tunneling current, into Charge Storage Layer 32 is an exponential function of the electric field across the dielectric through which it is tunneling. The tunnel current that primarily affects write speed is the tunnel current through dielectric 33. So the time it takes to either erase or program a transistor 10 or 10' is a very strong function of the electric field imposed on dielectric layer 33 during a write operation, either erase or program.
Additionally, the electric field created by the gate voltage terminates in the channel region of the bulk and in the poly. When the field terminates in the poly, it does so by either forming an accumulated layer of free charge carriers or by forming a depletion layer at the interface between the poly and the top of the gate dielectric. When the free carriers are accumulated, the poly acts nearly like a metallic electrode and very little voltage is dropped within the poly. However, when the free carriers in the poly are repelled from the interface to form a depletion layer, a significant amount of voltage can be dropped in the poly depletion layer.
As shown in FIG. 3, a positive bias is applied to the poly gate 12 relative to the channel 15 to program transistor 10. In this example, ten volts is applied to gate 12 while the source, drain and bulk are held at ground. The voltage difference between the gate 12 and the bulk 11 creates an electric field that passes through layers 31, 32 and 33 and creates a depletion layer 20 in the gate poly and a depletion layer 21 to form the channel 15 in the bulk. The applied gate-to-bulk voltage creates depletion layer 20 because the electric field attracts the free electrons in the N-type poly gate 12 toward the electrode and away from the interface between the gate 12 and the top dielectric 31. Likewise, the electric field created by the gate-to-bulk bias repels the free holes from interface between the P-type bulk and the bottom dielectric 33, forming bulk depletion 21.
The voltage difference between the gate electrode and the bulk electrode is called Vpp. Vpp is nearly equal to the sum of the voltages dropped across 20, 31, 32, 33, and 21; namely EQU Vpp.apprxeq.Delta.sub.-- V.sub.-- Poly+Delta.sub.-- V.sub.-- Top.sub.-- Ox+Delta.sub.-- V.sub.-- Storage.sub.-- Layer+Delta.sub.-- V.sub.-- Bottom.sub.-- Ox+Delta.sub.-- V.sub.-- Bulk.
Unfortunately, the voltage drop in the poly, Delta.sub.-- V.sub.-- Poly, provides no value in forming the conditions required to tunnel charge through the dielectric layer 33. In fact, when the depletion layer is present, the tunnel characteristics are much like what would be expected if the device were formed with a metallic gate electrode and the write voltage was lower by the amount dropped in the poly depletion. Since the tunnel current is an exponential function of the electric field across the dielectric, the write speed can be significantly degraded by the poly depletion voltage loss.
The circumstances are quite different when transistor 10 is being erased as shown in FIG. 4. When a negative bias is applied to the N-type poly gate 12 relative to the channel 15 of the N-channel transistor 10, the electric field serves to accumulate the free carriers 24 (electrons) in the poly 12 at the interface between the poly 12 and the top dielectric 31. In this case, there is typically negligible electric field lost in the poly and the erase speed is not degraded by voltage lost in the poly. Further, independent of the gate structure, the erase bias serves to accumulate holes in the channel at the interface between the bulk and the bottom dielectric 33, forming accumulation 23. As a result, negligible voltage is typically dropped in the bulk and so the erase condition creates the ideal result of the Vpp being dropped only over layers 31, 32 and 33; namely EQU Vpp=Delta.sub.-- V.sub.-- Top.sub.-- Ox+Delta.sub.-- V.sub.-- Storage.sub.-- Layer+Delta.sub.-- V.sub.-- Bottom.sub.-- Ox.
Under good program conditions, as best seen in FIG. 3, little or no voltage is dropped in the N-type poly gate 12. Preferably the voltage drop in the gate, Delta.sub.-- V.sub.-- Poly, will be much less than the write voltage, Vpp, applied to the gate. This was readily achieved in older technologies when the doping in the poly gate 12 was high, typically .gtoreq.10.sup.20 /cm.sup.3, and the layers 31, 32 and 33 were relatively thick, equivalently .gtoreq.170 .ANG. of SiO.sub.2. However, in more modern technologies which use a moderately or lightly doped poly gate 12 and which have relatively thin layers 31, 32 and 33, the voltage drop in the poly depletion layer 20 can be quite substantial, as shown in FIG. 5. In this case, there can be a significant amount of the applied voltage lost in the poly and the time required to program transistor 10 can be greatly increased.
The voltage lost in the poly depletion layer during a program operation has been calculated as shown in FIG. 6 for an applied voltage of 10 volts. In this plot, the percent of the applied voltage dropped in the poly is indicated on the vertical axis. The horizontal axis on the right hand side marks the doping concentration in the poly. The horizontal axis on the left hand side indicates the thickness of the gate dielectric in values equivalent to a thickness of SiO.sub.2. The shaded bands in the contour shows domains where the percentage of applied voltage dropped in the poly lies within a 5% range. Six bands of applied voltage drop in the poly are shown; specifically 0 to 5%, 5% to 10%, 10% to 15%, 15% to 20%, 20% to 25% and 25% to 30%.
As shown in FIG. 6, the percentage of applied voltage that is dropped in the poly increases rapidly as the doping concentration falls below 10.sup.20 /cm.sup.3. Once the concentration reaches 10.sup.18 /cm.sup.3, the percentage of applied voltage that is dropped in the poly achieves about 20%. With further reductions in doping concentration below 10.sup.18 /cm.sup.3, the percentage of applied voltage that is dropped in the poly increases only gradually. This lack of sensitivity occurs because of the formation of an inversion layer in the poly in the lower concentrations.
The exponential dependence of tunneling current on linear changes in electric field causes the program time to increase by about an order of magnitude for every 10% decrease in voltage across the gate dielectric. The percentage of applied voltage that is dropped in the poly is less than 10% for thicknesses as low as 57 .ANG. of equivalent gate dielectric when the poly doping concentration is 10.sup.20 /cm.sup.3 or higher. However, the percentage voltage drop exceeds 10% for a poly doping concentration of 10.sup.19 /cm.sup.3 when the gate dielectric thickness falls below only 170 .ANG.. This result greatly limits the range of either the poly doping or the equivalent gate dielectric thickness if program speeds cannot be compromised, which is often the case.
Likewise, when a positive bias is applied to the P-type poly gate 12' of P-channel transistor 10' relative to channel 15', the electric field serves to accumulate the free carriers (holes) in the poly 12' at the interface between the poly 12' and the top dielectric 31. In this case, which is an erase condition, there is very little electric field lost in the poly and the erase speed is not degraded by voltage dropped in the poly. However, when a negative bias is applied to the poly gate 12' relative to the channel 15', the electric field serves to repel the free carriers in the poly 12' from the interface and a depletion layer forms in the poly 12' above the top dielectric 31. In this case, which is a program condition, there can be a significant amount of the applied voltage lost in the poly in modern structures and the time required to program transistor 10' can be greatly increased.
While writing a non-volatile memory transistor, it is desirable to achieve the fastest possible program time without compromising product yield and reliability. The program time directly affects the rate at which data can be stored in a non-volatile memory product. Erase speed is not so critical because data is being erased, not stored. Sections of the memory product can be erased in a background manner, long before those sections are selected to store data. Unfortunately, prior art embodiments are constructed to favor fast erase speeds and not fast program speeds as effective gate dielectrics scale to thicknesses of 170 .ANG. and below.
Further, voltage drops in the poly significantly reduce the sensitivity of the write speed to variations in the thicknesses of dielectric layers 31, 32 and 33. This is advantageous in establishing insensitivity to manufacturing induced thickness variations. However, this lack of sensitivity also makes it difficult to scale the program voltage of transistors 10 and 10'. As the layers 31, 32, and 33 are reduced in thickness, the fraction of the applied voltage that is dropped in the poly depletion increases. Eventually, reductions in the thickness of layers 31, 32 and 33 has a diminishing impact on reducing the program voltage. This can occur once the layers 31, 32 and 33 produce a gate dielectric that is dielectrically equivalent in thickness to 170 .ANG. or less of silicon dioxide. Thus, the non-volatile transistor becomes difficult to scale to take advantage of lower program voltages for deeply scaled technologies. Also, the lack of sensitivity to variation in the thickness of layers 31, 32 and 33 is replaced by a sensitivity to variations in doping in the poly, which is traditionally less controllable.
So without further innovation, deeply scaled non-volatile transistors for 0.7 micron technologies and below that have effective gate dielectrics thicknesses of 170 .ANG. and below will provide faster erase speed, rather than faster program speed. Further, the scalability of program voltages by using conventional dielectric scaling methods is limited, making it difficult to integrate into a low-voltage CMOS process flow as device geometries reduce to below 0.7 microns.