1. Field of the Invention
The invention relates to techniques for incorporating high-order or abstract human knowledge into adaptive learning apparatus, to improve system learning efficiency and accuracy and improve the system's ability to generalize. More particularly, the invention involves adaptive fuzzy membership filters and neural networks using the same. An embodiment is described for the control of processes whose spectra reveal process input parameters.
2. Description of Related Art
A neural network typically comprises a plurality of input neurons, which may simply be isolation buffers and which are sometimes referred to as "first level neurons", to receive input excitation signals. The outputs of the input neurons are coupled to selected ones of the inputs of a plurality of second level neurons via synaptic interconnection circuits or synapses. Different ones of the input signals combine with different weights at the inputs of the second level neurons, and the particular weight to be accorded each signal is governed by a setting in the synaptic interconnection circuit which couples that signal to that neuron. The weighted combination of signals is typically a weighted sum of such signals, and may further be transformed by a transfer function which is typically (but not necessarily) nonlinear. A common nonlinear transformation for this purpose is the sigmoid function. The outputs of the second level neurons may themselves further be coupled through another set of synaptic interconnection circuits, with respective weights, to inputs of respective ones of a plurality of third level neurons. In this case the second level neurons are often referred to as "hidden" neurons. The outputs of the third level neurons may be provided as the outputs of the overall network, in which case these neurons may be referred to as "output neurons", or they may be coupled to yet additional layers of neurons by additional selectively weighted synaptic interconnection circuits. In addition, in some neural network architectures, outputs of some of the neurons are fed back to the inputs of a prior level.
Neural networks may have a fixed interconnection pattern and fixed synaptic weights, or the synaptic weights may be made variable. If the synaptic weights are made variable, then the network may be given the capacity to "learn". Alternatively, the learning process may be simulated off-line, and once the synaptic interconnection weights are determined, they can be transferred into a hardware chip, for example by laser direct write.
Background information on neural networks, including a survey of various architectures for neural networks may be found in "DARPA Neural Network Study", Armed Forces Communications and Electronics Association International Press, November 1988, and in R. P. Lippmann, "An Introduction to Computing with Neural Nets," IEEE ASSP Magazine, April 1987, both incorporated herein by reference. A neural network may be constructed in any known manner, but the techniques described in U.S. patent application Ser. No. 07/894,391, filed Jun. 5, 1992, entitled "Process for Forming Synapses in Neural Networks and Resistor Therefor", by inventor Chi Yung Fu, incorporated herein by reference, are preferred. Alternatively, a neural network can be implemented in software on a computer.
A problem arises when it is desired to use a neural network to analyze a large number of input channels. A large number of input channels implies a very large number of parameters (interconnection weights in a conventional neural network) which need to be adjusted during the learning process. The number of parameters is usually on the order of (N1+1)*N2+(N2+1)*N3, where N1, N2 and N3 are the number of neurons in levels 1, 2 and 3, respectively. This results in a lengthy training period. Also, while neural networks can learn, in many situations the learning process could be greatly simplified and the result would be in a more general form, if human or abstract knowledge could be incorporated into the system before the system begins to learn. The higher the form of human or abstract knowledge, the more capable would the system then become. For example, if knowledge of symmetry can be incorporated, then the analysis of the system would be greatly simplified. The question is then how to incorporate such abstract knowledge into a learning system.
The above problems become very apparent where the neural network is used to analyze a spectrum produced by, for example, a plasma in plasma-based manufacturing equipment (although they are by no means limited to this situation). Such equipment is used, for example, the formation or etching of layers in a semiconductor manufacturing process. It is also used in the deposition of thin films used, for example, in superconducting circuits. It is also used in the production of active matrix liquid crystal displays, and in the thin films required for magnetic and optical storage media.
A plasma is a collection of electrons, radicals, positive and negative ions, neutral atoms, molecules and molecular fragments. The plasma excitation can be caused by RF or microwave frequency power applied, or by other methods as well. The plasma can also be enhanced. by coupling with a magnetic field.
Plasmas are used in semiconductor and thin film manufacturing, primarily in several types of process steps. In plasma-enhanced chemical vapor deposition (PECVD), input gases are reacted in a glow discharge to form a plasma, which reacts chemically at a subject surface (e.g., a wafer) to deposit a desired material thereon. Plasma enhanced (assisted) depositions are described in Sze, "VLSI Technology", 2d. Ed. (1988), primarily at pp. 262-266. The entire Sze reference is incorporated by reference herein. In a "sputtering" technique, a plasma is formed in a manner similar to that in PECVD, but the plasma is attracted to a target. The plasma bombards the target with high enough energy to loosen particles of the target material, such as aluminum, which then deposit on the subject surface. Sputtering deposition is described primarily at p. 387 of Sze.
In a plasma etching technique, the plasma generates highly reactive fragments and radicals which react with the surface material to form volatile products. These volatile products leave the surface, resulting in etching. In reactive ion etching (RIE), in addition to the chemical reactions, ions are accelerated toward the surface material and either react directly with the material or assist in the reaction by a radical, thus enhancing the etching process. Reactive plasma etching is described primarily at pp. 184-232 of Sze, and RIE is described in Sze, primarily at pp. 213-215 and 396-398. In all of these techniques, it is important that the flow rate of the gases provided to form the plasma, the plasma power applied to form the plasma, the biasing of the subject surface, the temperature of the subject surface, and the pressure inside the chamber containing the plasma and the subject surface, be carefully controlled so that a plasma having exactly the desired characteristics is formed.
In the past, plasma-based processing equipment has used individual closed-loop control of the input parameters such as pressure, power and gas flow, to try to maintain them each at a value known to produce a desired plasma. This type of control is indirect since it is not the measured parameters that individually control the outcome of the process, but the combined effect of all the parameters. Such control systems also fail to take into account calibration errors in the controls, as well as other, uncontrolled, sources of gas which may affect the plasma. For example, in oxygen-reactive sputtered deposition, useful for example to deposit Al.sub.2 O.sub.3, the oxygen flowing through the mass flow controller into the chamber may not be the only source of oxygen. Oxygen may also be out-gassing from the interior walls of the chamber. Extra sources of gas such as out-gassing are not accounted for in the case of conventional process control. Thus, although each input parameter to the plasma-based process step is under closed-loop control, the overall process step may not be entirely closed-loop. Yields may thereby be reduced, and manufacturing costs increased.
The conventional technique is also very difficult to model for process optimization. For a conventional plasma processing step, fundamental process modeling requires a detailed understanding and application of plasma physics which, though making substantial progress, is not yet readily available in a manufacturing environment. Thus, process optimization requires extensive experimentation on actual equipment and is typically statistically base. In addition, moving a process to new or different equipment requires parameter adjustments and substantial downtime, thereby discouraging equipment upgrades and complicating the replacement of worn equipment.
The inadequacies of the conventional technique for controlling input parameters sometimes result in bad runs which process engineers must try to rescue. At present, if a process deviates from specification during processing, the resulting processing errors are corrected either by rework, in which the process is stopped and the erroneous step is redone, or by "feed forward", in which the process continues and adjustments are made in subsequent process steps to compensate for the error. Both options cause logistical and scheduling problems in the operation of the fabrication line. Additionally, merely because the conventional technique accepts errors rather than preventing them, they render reliability of the product uncertain.
Because the conventional technique is not entirely closed-loop, a metrology step is often included after a plasma processing step to measure such results as the thickness of deposited material, resistivity, etc. In the evolving cluster tool concept, equipment is grouped under a vacuum and wafers are transferred between processing systems by robotic arms. Cluster tools typically have a limited number of ports for processing stations. The need to include metrology steps, therefore restricts the number of pieces of processing equipment which may be placed in a cluster.
Apart from the actual wafer processing problems caused by conventional process control equipment, maintenance and repair of the equipment is another major manufacturing issue. Overmaintenance is costly since it unnecessarily increases equipment downtime, but undermaintenance risks faulty products or low yield. At present, repair of processing equipment is typically performed after the equipment fails, which is undesirable because downtime for unscheduled repair can cause significant logistical and schedule problems.
Further, the conventional plasma-based processing technique cannot be used to deposit certain materials or materials with certain properties since the processing for such materials is difficult to control.
The related application mentions that when particles in a plasma relax to a less excited state, they emit energy in a portion of the electromagnetic spectrum which ranges mostly in the extended optic frequency range (including far IR and deep UV). There is a one-to-one correspondence between a given plasma and the input conditions (e.g., gas flow, pressure, plasma excitation frequency and power) for a particular system configuration under which it was produced, but the particular correspondence is in most cases not known and varies from one piece of equipment to another. As discussed in Sze, plasma spectra have been used in the past to determine the presence or absence of particular neutral and ionic species by correlating an experimental spectral series with a previously determined spectral series. Relative concentrations of species were obtainable in this manner, although minor variations were typically too subtle for a process engineer or even a plasma specialist to detect. Plasma spectra have also been used for "endpoint detection", i.e., determining when a plasma processing step is complete. This is possible in an etching step, for example, when the complete removal of an etched layer eliminates the contribution which the etched layer provided to the composition of the plasma. See also Malchov, "Characterization of Plasma Processes with Optical Emission Spectroscopy", SPIE Vol. 1392, Advanced Techniques for Integrated Circuit Processing (Oct. 4, 1990), pp. 498-505, incorporated by reference herein.
According to the related application, roughly described, the characteristics of the plasma in a plasma-based manufacturing process step are monitored directly and in real time by observing the spectrum which it produces. One or more of the process input parameters are controlled or adjusted in response to any deviation of the spectrum beyond a narrow range. This approach is advantageous because the success of the processing step depends on the characteristics of the ultimate plasma, rather than on the separately controlled input conditions in response to which the plasma is formed. If, for example, one of the flow controllers in a conventional system is out of calibration, then it may be maintaining an incorrect flow condition while reporting back that its flow is at the target value. The plasma spectrum will be slightly different, however, and a system which monitors the plasma directly rather than merely an input parameter will be able to compensate for such miscalibration.
As mentioned, the differences between the spectra produced by plasmas are extremely subtle, usually too subtle to be detected by a process engineer or plasma specialist. Additionally, the actual correspondence between the spectrum produced by a plasma and its input parameters are generally not presently known. According to another aspect of the invention, an artificial neural network is used to analyze the plasma spectrum and generate the control signals necessary to adjust one or more of the input parameters as necessary. Neural networks are very effective in identifying small signal changes in a very noisy environment, and can learn the relationships between input parameters and plasma spectra without any requirement that they be derived in advance.
The monitoring of plasma spectra is an advantageous way of controlling the process step since highly reliable optical sensing techniques (e.g., optical spectrometers) are available and are routinely used in semiconductor processing (e.g., to determine when to end a process step). Further, such sensors can be external to the processing chamber and thus avoid perturbing the process step. Optical sensing is also extremely fast, thus allowing real-time monitoring and control of single-wafer processing.
By monitoring the plasma itself at the reaction site, closed-loop control is provided which automatically compensates for otherwise uncontrolled sources of gases, miscalibrations of input parameter controls, and other sources of error not adequately addressed in conventional systems. Thus processing mistakes are prevented by correcting the input parameters in real-time so that the results satisfy the target criteria. Thus the amount of rework and feed forward to correct a processing error is minimized. Additionally, since the process is controlled to satisfy the target criteria of the plasma itself, most metrology equipment may be unnecessary.
Direct monitoring of plasma characteristics also permits a trend analysis of the adjustment signals provided to the input parameter controls. Such trends can predict which components actually require servicing. Thus, servicing and equipment downtime can be scheduled just in time, and, in conjunction with the scheduling of maintenance on other equipment in a fabrication line, can avoid both undermaintenance and overmaintenance.
Further, intelligent adaptive control can be used to deposit materials that were previously not achievable in a plasma-based processing step because the processing for such materials is difficult.
Though not exclusively, intelligent adaptive control may be applied most advantageously in back-end plasma processing since back-end interconnects strongly influence chip yield, performance and reliability. Applying the technique to back-end inter-chip-level interconnects can also impact packaging technology. The technique can be applied in III-V or II-VI processing as well as in silicon-based processing.
Thus the analysis of a plasma spectrum during a manufacturing process step is a desirable application for a neural network. But the application is not limited to plasma spectra--it is also desirable to use a neural network to analyze other types of optical and non-optical spectra as well, such as a mass spectrum produced by a residue gas analyzer (RGA), an emission spectrum such as that produced by laser-induced fluorescence, spectra produced by colorimeters, photometers, spectrophotometers, atomic absorption spectrometers and by other techniques of absorption spectroscopy. There are, of course, many non-spectrum-based applications for neural networks as well.
However, adequate spectral analysis often requires observation of a range of wavelengths divided into a large number of signal channels. If all such signal channels are provided as inputs to a conventional neural network, then the problems mentioned above of large numbers interconnection weights arise.
Accordingly, it is an object of the invention to provide techniques to manage a large number of inputs to a neural network by using abstract human knowledge as a constraint. The techniques provided herein can significantly reduce the number of parameters for training, as well as take advantage of abstract knowledge already known about the system being analyzed. As will be seen, the techniques can be used also without a large number of inputs and, indeed, even without a neural network.