The present invention relates in general to fully automated systems and methods for operating industrial equipment, such as automated semiconductor fabrication facilities, and relates in particular to automated systems and methods for resolving error conditions and other issues which occur in manufacturing facilities which utilize automated manufacturing execution systems, material control systems and real-time dispatching systems.
Computer Integrated Manufacturing (CIM) systems used in automated IC fabrication facilities (fabs), such as a facility for processing 200 mm or 300 mm wafers, may include the following:                (1) an automated manufacturing execution system (MES) such as IBM's SiView Standard MES (from IBM Japan Industrial Solution Co., Ltd, (iiSC));        (2) an automated material handling system (AMHS) such as the Muratec material control system (MCS) from Murata Machinery, Ltd. and/or an automated reticle handling system (ARHS); and        (3) an automated real time dispatcher (RTD) such as those available from IBM SiView or Brooks Automation/AutoSimulation Inc.Still other companies provide MES, AMHS and RTD systems which could be used in place of any one of those named above in an IC fab. In all such automated facilities, the basic goals are generally the same: to operate the overall facility with high degree of efficiency, quality and flexibility, in order to maximize productivity and return-on-investment. Often times, this in turn requires optimizing product mix and output, while minimizing downtime.        
When designing and building a complex automated factory, such as a fully automated 300 mm semiconductor device fabrication facility (fab), it is known to choose suppliers and vendors by evaluating their systems and components against the planned requirements of the fab using a “Best Of Breed” (BOB) process. Using this process, each system, application, or component that is believed best in the industry for the fab is selected, subject of course to availability, compatibility and cost constraints, to help achieve the objectives listed in the previous paragraph. Also, equipment in an automated plant is at times selected with a view toward meeting future requirements and/or plant expansion. Naturally, the designers and engineers responsible for bringing such an automated plant on line must deal with the inherent problems associated with integrating the many disparate pieces of equipment and their control systems as well as the overall control systems (such as the MES, MCS and RTD), all of which may collectively come from many different suppliers, as a functioning cohesive automated plant.
In these kinds of automated IC fabrication facilities, error conditions, problems and other issues such as the continued performance of the equipment and systems (e.g., percentage up-time) can arise when an automated fab is running, particularly in fully automatic mode. In the SiView Standard MES, this mode is sometimes called “Full Auto3 Mode”. As in all manufacturing operations, issues will arise, such as how long can a tool, carrier or other piece of equipment be expected to run before it requires service or preventive maintenance. Inevitably, error conditions, problems and other issues arise that are not scheduled, but nevertheless must be dealt with as part of running such a facility. In the modern automated factory, such as the fab facility, these issues can include a variety of conditions or problems, which are typically documented by automatic error reporting systems for later manual analysis and follow-up. The hope and expectation is that with some further study by attending fab support personnel (such as skilled technicians and engineers), the root causes of the various errors and other issues can be determined and corrected, thereby improving overall plant efficiency, reducing cycle times, increasing yields, and improving tool availability and uptime.
Oftentimes, after support personnel have investigated, appropriate corrective actions can indeed be taken to clear an error and/or eliminate or remedy a current problem. Sometimes corrective actions can be taken on the spot. At other times, usually when the solution is not readily apparent or when considerable time and/or resources will be needed to implement a solution, the corrections may be put off until a later date or time. While it is desirable to correct error conditions to put an end to or at least to reduce the frequency of the occurrence of such errors or problems, at times all that can be done presently by the attending personnel is to take the tool equipment carrier and/or lot off-line or put it on hold so that the specific entity or object is not available to the real time dispatcher. The corrective actions needed may include resetting the tool or station or restocking the tool or station with needed supplies or raw materials. Other corrective actions may include equipment adjustment, repair, process changes and/or preventive maintenance. Actions taken to return the affected portion of the fab to productive status may also include removal of the affected work in progress (WIP), and removal or replacement of carriers, tools or tool components. A tool or station may be taken completely off-line for later debug or repair, especially when there are other identical tools or stations nearby to continue to process the carriers or other work in progress.
The problems which can be encountered in a highly automated fab environment are at least as varied and likely much more numerous than the different types of equipment and processes which are being carried out in the fab. A number of problems may relate to minor glitches or bugs in the automated material handling equipment or in their control systems, or in the interactions between control systems. An exemplary but non-exhaustive list of problems or other issues which might occur in connection with the transport of automated carriers such as front-opening unified pods (FOUPs) in such an automated manufacturing facility may include the following, which are each typically assigned a numeric code for convenient reference:
TABLE 1Problem CodesCodeDescription−201Reject, Duplicated TrJobBID (Transport Job ID).−202Reject, Unknown CarrierID.−203Reject, the Carrier ID already exists in another location.−204Reject, Unknown source location.−205Reject, Unknown destination location.−206Reject, Destination is full.−207Reject, Source is not available.−208Reject, Destination is not available.−209Reject, Route from source to destination is not available.−210Reject, Expected Start Time violation.−211Reject, Expected Stop Time violation.−212Reject, The carrier belongs to another owner.−213Reject, Batch transfer, at least one request has been rejected.−214Reject, Pickup procedure for former job at the equipmentport is not yet completed.−215Reject, carrier is in an unknown state.While these automation problems do not affect product quality, they nevertheless can slow production.
As one example of how a problem can arise, consider the following. In the Full Auto3 mode, the RTD system, which includes various dispatching scripts and logic rules for each equipment ID which is being utilized, may well try to execute Start Lot Reservations even if one of the above errors are detected or encountered by the MES and/or the MCS. The RTD and MES are not programmed to logically check for or to try to resolve any of these their errors. The existing systems, which have been integrated and programmed in order to carry out certain expected functions, normally are not set up to deal with unpredictable errors, problems or other issues that may arise in any systematic way, other than to report their occurrence. This is particularly true with regard to unexpected interoperability problems that can arise between multiple systems, applications and/or pieces of equipment in the fab. These kinds of automation problems are compounded when different vendors or suppliers are responsible for different parts of the overall automated fab. Further, to our knowledge, current fab automation systems do not provide for handling (i.e., dealing with) and most importantly, resolving or recovering from such errors that occur between various systems and/or between the various pieces of equipment supplied by numerous vendors and semiconductor equipment suppliers, on an automatic basis.
The handling of such occasional errors and other seemingly random issues is a nontrivial task. Typically, trained engineers, programmers, and technicians carry out both real time and planned investigations as they try to resolve issues by diagnosing the problems at hand, and, if their time permits, determining the underlying root cause(s), and analyzing and implementing practical corrective action(s). Even for known problems, trained operators, technicians or engineers are required to intervene and interact manually with the equipment and the control systems to resolve the error conditions or other problems. Often times, they take the tool or other equipment out of auto mode and manually manipulate things or use the tool's interface to reset the tool, or sometimes even physically move objects, such as a FOUP reticle operating valves or doors, etc. in order to get the automated equipment back into production. In other words, some manual action or manual reset activity is typically required to resolve the problem and to restore the affected equipment or WIP carriers to their status of being ready to operate again in a fully automatic mode, or to take them permanently off-line until repaired in order to get them out of the way so they do not hold up production.
There are some drawbacks to using line personnel and manual intervention to resolve almost every error, problem and other issue in an automated fab. One factor justifying the added cost of developing a fully automated fab is the reliability of the AMHS to take the place of line personnel in loading and unloading a fully loaded 300 mm FOUP, which can weigh as much as 25 lbs (11.3 kg). Another is that the weight of carriers which may prove difficult for some workers to manually handle on a regular basis, thus leading to widespread use of some form of mechanical assist, cart, or automation to load and unload the FOUPs to and from the load ports of the tools. Also, in a large fab, there are many pieces of and many different types of equipment. At any given time, fab personnel may be engaged in other activities, and cannot immediately service the equipment or carrier that has just gone down. Line personnel may be involved in other important tasks or information exchanges with other operators, technicians or engineers, or with other CIM systems or controls themselves, or with other problems such as implementing corrective action or taking preventive steps elsewhere. Or they may be located some considerable distance away from the equipment which now requires attention. They may even be absent from the area, e.g., due to training classes, personal breaks or lunch. In addition, the problem areas to be manually inspected and resolved may be located well above or below the floor line or in other difficult-to-reach locations. Also, the personnel on call who are supposed to attend to the problems with the machinery or systems may not yet have had the training or experience to deal with the particular kinds of problems that have just arisen. Moreover, it is difficult for any one person to be a master of all possible corrective procedures and tasks that may arise with regard to the great variety of complex equipment and integrated CIM systems and applications found in the typical automated fab or other similarly complex automated manufacturing facility.
The typical IC fab includes very expensive equipment. Also, the memory, logic and ASIC chips being fabricated on the silicon wafers, which are sometimes referred to work in process (WIP), are often quite costly as well. Accordingly, any downtime or partial stoppages of critical processes (i.e., those that represent the typical bottlenecks to maximum production) reduce the overall productivity of the fab and thus often end up being quite costly. Thus, unplanned downtime is generally to be avoided wherever possible. However, part of the nature of the error conditions and other problems which can cause such downtime is that they are often unexpected and are quite varied. Specific problems which regularly occur with significant frequency are typically subjected to a concentrated investigation and analysis, and then manual effort to remedy such problems by eliminating the root cause(s) once and for all. Most suppliers of the automated equipment also go to considerable lengths to ensure that their individual pieces of equipment are robust and reliable. They have worked out many of the known issues with their equipment, which often leaves the infrequently occurring, seemingly random or truly oddball problems as the typical kinds of errors or other issues that crop up on the factory floor. Under such circumstances, it is often difficult to determine the cause(s) of such problems. For example, a certain reported error condition may have a few different possible causes. Accordingly, the conventional practice is to restore the equipment and the WIP carriers to operational status quickly, and then later deal with such relatively infrequent or isolated error conditions as time permits, often through painstaking manual investigation and analysis. This follow-up work is often done by well-trained personnel, who may consult the historical data which accumulates relative to these error conditions in order to hopefully understand them, to identify root causes thereof, and to determine what corrective action to take.
Accordingly, in order to reduce downtime and to restore tools, equipment and carriers to fully functioning automated status quickly, we have recognized it would be highly desirable to find an automated way to deal with as many of the myriad errors, problems and other issues which can arise from time to time in an automated manufacturing facility as is practical to do so. This would be particularly desirable in complex manufacturing facilities that employ several different types of CIM systems and applications, and many different kinds of complex equipment and tools. In that kind of environment, the act of restarting multiple automatic systems for that part of the plant which has gone down can be time-consuming and a highly complex task in itself, even for trained engineers, programmers or line personnel.