The present invention relates to industrial process controls and more particularly to complex, multiprocessor control systems which have redundancy capabilities to provide for continued operation of an industrial process even though a failure may occur in some portion of the control system.
Many industrial processes are so complex that the most cost effective way to design a control system for them is to base the design on the use of multiple, large, central processors (CPU's) which share the work load. One theoretical alternative is to use numerous hard-wired microprocessors, but design complexity makes this approach impractical or undesirable.
The industrial user of control systems places high priority on the availability of the production equipment used for process operations because process availability is a direct measure of productivity. It is desirable that multiprocessor control systems be structured to provide for stopping production when the status of the process or its control system becomes unsafe or otherwise undesirable. However, to provide best process availability, the multiprocessor control system should be structured to avoid unnecessary process trips. Specifically, a control system fault should not cause a process trip if the fault can be circumvented.
One past practice in the control art has been to provide manual operator override in the event of a failure in the automatic control. Another practice has been to provide a redundant backup automatic control system which is bumplessly switched into operation as the active control if a fault occurs on line in the designated primary automatic control. Process continuity and availability are thereby maintained.
Generally, these approaches have been acceptable for single CPU systems, but they are inadequate for multi-CPU shared-core systems. For example, in a typical four-CPU system, even the simplest restart, without system reconfiguration, typically requires the operator to sequence twelve push buttons. The provision of a duplicate multi-CPU system to serve as a backup is not a viable alternative due to the size and cost of usual multiprocessor systems. The use of a backup also does not take advantage of the inherent redundance which exists in a multiprocessor system.
In past data processing system practices, computer faults have been handled by stopping the computer and automatically restarting the computer through its bootstrapping operation to resume its duty if the shutdown was due to a program alarm as opposed to a deactivating hardware failure. However, no on-line control activity has to be maintained during computer system fault recoveries in the data processing art.
The known prior art activity thus provides no direction for automatically achieving high process safety and availability where an industrial process control system is structured with multiple central processors and where a discrete backup system does not exist because of prohibitive costs.
It is desirable that a multiprocessor computer control system have a high level of reliability, be easily maintained, be fault tolerant, and in the event of a failure, be quickly recoverable. Generally, system reliability can be incorporated into system hardware and software from the early design stages by the application of good design practices which limit fault propagation by the use of redundant logic for critical subsystems, and by the application of available error checks to trap faults before they can induce a system failure.
The design of the overall multiprocessor control system configuration should also generally take into account any single point of failure and provide the necessary backup equipment to sustain operation when such a failure occurs. Reliability and maintenance may also be enhanced by the use of modular construction which provides definable interfaces and simplifies the development of individual modules.
While system maintenance time is reduced by good mechanical design and construction, a significantly larger percentage of the cost of maintenance is expended diagnosing a failure rather than repairing it. Thus, as a result of the high level of complexity involved in large industrial production systems, it is desirable to have a multiprocessor control system diagnose its own failures and identify the equipment or module which has failed in some way.