The present invention is directed, in general, to computing and processing systems and, more specifically, to systems and methods for distinguishing a device failure from a failure to communicate with the device.
Automated plant control systems include a comprehensive set of algorithms, or software-definable process control routines, to control and monitor various processes within, for instance, a manufacturing facility. The control systems can be tailored to satisfy a wide range of process requirements globally or within specified portions of the facility. Conventionally, the control systems include a variety of modules, each having its own processor and firmware, linked together by communication buses to result in a distributed process control system. The distributed nature of the system affords high performance with the capability to expand the system incrementally to satisfy growth or modifications in the facility.
In a real-time process control system, processing can be distributed in such a manner where there exists two controllers coupled together paralleling the same operation. Because the same operation or process is paralleled, these controllers are referred to as xe2x80x9cdual redundant process controllers.xe2x80x9d Dual redundant process controllers operate in such a manner that one of the controllers (designated the xe2x80x9cprimary controllerxe2x80x9d) is always in a lead state (meaning that it has actual control of all or part of the system). The other process controller (the xe2x80x9csecondary controllerxe2x80x9d) mirrors the primary controller""s processes but is not in actual control of the system. In effect, the secondary controller parallels the lead controller in all aspects of operation and data storage and remains ready to take over from the primary controller should the primary controller fail. If such a failure occurs in the primary controller, the operation of actual control (xe2x80x9clead statexe2x80x9d) of that part or all of the system should be assumed by the secondary controller. When the secondary controller asserts the lead state, the primary controller can no longer operate in the lead state and the secondary controller then becomes the primary controller for that part or all of the real-time process system.
Normally, each of the dual redundant process controllers contains a processor and firmware and is linked to the overall system. The processor could be, for example one of the i960Hx series of superscalar RISC processors commercially available from the Intel Corporation. The processor usually resides on a local bus which also includes local random access memory (xe2x80x9cRAMxe2x80x9d), memory for program storage, and hardware for monitoring and controlling external functions. Firmware is a computer program contained persistently in a read-only memory (xe2x80x9cROMxe2x80x9d) associated with the processor. The primary activity of the local bus is control and management of the controller through firmware execution by the central processing unit (xe2x80x9cCPUxe2x80x9d). Additionally, the primary and secondary controllers are normally interconnected with each other in some manner of circuitry like coaxial or fiber optic cable. This inter-connectivity between dual redundant process controllers allows the controllers to communicate operational states, and keep mirror-image activity of the lead state controller communicated to the secondary controller along with any information in the form of data that should be stored on the secondary controller.
The fundamental and critical requirement of real-time process systems using dual redundant process controllers is the singularity of operation for the lead-state controller over at least that part of the system it is to control. One and only one of the dual redundant controllers can be in actual control (have the lead state) of all or part of the system at any time. If lead-state singularity is not preserved, the processing system could encounter dual commands from the primary and secondary controllers that would be competing and/or conflicting, which could lead to a system lock up, overload, shut down, or other devastating process-system type failure. In large manufacturing facilities or plants, a failure of a process controller could be very costly in many ways including down-time for equipment and manpower, probable loss or destruction of raw materials, and the subsequent expense of restarting the process. In fact, the avoidance of such a devastating system failure is so important that it becomes the basis for the conceptualization of redundant controllers. And the absorption of the additional costs of having redundant controllers are now a necessary consideration rather than an exception.
Because the criticality exists for lead-state singularity for dual redundant process controllers, the dependency on the reliability of inter-connectivity of communications between the dual redundant process controllers is paramount. The primary and secondary controllers must be able to intelligently transition the lead-state control from the primary controller to the secondary controller timely and effectively in the event of a failure of the primary controller, allowing the process system to continue without any interruption or at least as minimal an interruption as possible.
A problem that arises from the critical nature of the singularity of operation of the lead-state controller, is the ability of the secondary controller to correctly determine when to assert the lead state. As previously discussed, it is paramount for process-system integrity that the secondary controller correctly determine when to assert lead-state control. Failure scenarios can be of more than one type and may or may not create the necessity for the secondary controller to assert lead-state control.
If the failure is an inter-controller communications failure, as in a connector cable break, the primary controller remains viable and should remain in the lead state. The secondary controller should be intelligent enough to know that no requirement nor attempt to assert control responsibility is necessary because the primary controller has not had a failure occur. On the other hand, if a device failure occurs in the primary controller, the necessary requirement exists for the secondary controller to know that the partner device failure has occurred and to immediately activate and assert the lead state. And for both failure scenarios, there is always the basic requirement to ensure the two controllers are not colliding while attempting to control the system. Without the ability for the secondary controller to distinguish between a device failure from a failure to communicate with the partner device, lead-state control could be asserted by the secondary controller and possibly compromise the lead-state singularity of the dual controllers, jeopardizing process system integrity.
Ideally, if the secondary controller could know that a device failure has occurred, a transition from the primary controller to the secondary controller could be determinatively effected, thereby preserving system integrity. Thus, it is advantageous that the secondary controller have the ability to assess the difference between a device failure in the primary controller and that of a communications link failure from the inter-connectivity of the two controllers.
Previous attempts have been made to accomplish the task of inter-connectivity reliability and distinguishing failure scenarios between controllers by adding hardware to establish alternate communication paths. By allowing alternate communication paths, it was thought the solution had been achieved. But other problems came to light with the additional hardware including increased cost of additional devices, added complexity and a possible degradation of reliability that the additional hardware created with new and possible undetectable failure scenarios for the controllers. In effect, the solution that was being provided actually introduced more problems than it solved and could defeat its intended purpose.
Another problem that has been encountered in effectively transitioning lead-state control is the ability of the primary controller to report to the secondary controller its device failure during sudden power loss. Reporting between the controllers should occur even during sudden failure. Even with a sudden-power device failure in the primary controller, enough power should exist for the primary controller to send the critical failure notification and allow the secondary controller to have enough time to respond and assert itself as lead-state controller. Under sudden power loss conditions, sometimes the hardware loses power within several microseconds.
Attempts have been made to address the sudden-power device failure problem also by way of alternate communication paths and/or through default states as in existing Honeywell equipment, but the biggest shortcoming of the previous methods, besides the need for additional hardware, is the time that is needed to detect and respond to such a failure. Current methods require hundreds of microseconds or longer to recognize and address the sudden-power device failure problem. Although such a time period may seem minuscule to most people, that length of time for a real-time processing system in a manufacturing facility could seriously or fatally impact the whole process system.
If within a few microseconds, the primary controller could detect and report a device failure to the secondary controller; the secondary controller could receive and assess the failure notification; the secondary controller could subsequently effectively respond by asserting the lead state, while preserving lead-state singularity; and such lead-state transition could be accomplished without the need for additional inter-device hardware, the dual redundant process controllers"" efficiency and reliability would be enhanced, and the process system""s integrity would be better preserved. Further the manufacturing and/or processing plant industry could take advantage of a more reliable, and cost efficient system of dual redundant process controllers.
Therefore, what is needed in the art is a more reliable, efficient, and cost effective way for dual redundant real-time process controllers to discriminate intelligently between intercontroller failure modes in a distributive control system to avoid the significant impact of plant process impairment.
To address the above-discussed deficiencies of the prior art, it is a primary object of the present invention to provide a positive way to distinguish device failures from inter-device communication failures.
In the attainment of the above primary object, the present invention provides, for use in a control system containing first and second devices coupled together for inter-device communication, a circuit for and method of distinguishing a failure of the first device from a failure of the inter-device communication and a real-time process control system employing the circuit or the method. In one embodiment, the circuit includes: (1) a device failure signal generator, associated with and separately powered from the first device, that communicates a device failure signal to the second device upon occurrence of a predetermined condition and (2) a device failure signal detector, associated with the second device, that detects the device failure signal, detection of the device failure signal contraindicating a failure of the inter-device communication.
The present invention therefore introduces the broad concept of affirmatively initiating inter-device communication upon occurrence of a predetermined condition (to be explained below) to, in effect, test the inter-device communication. If the second device receives a device failure signal, it can assume that inter-device communication is intact. In many cases, the second device can further assume that the first device has failed.
The ability to discriminate between device failures and inter-device communication failures is particularly advantageous when the first and second devices are primary and secondary controllers in the control system. Therefore, in one embodiment of the present invention, the first and second devices are primary and secondary controllers of the control system, respectively. Of course, the first device may be a sensor, a controllable device or other piece of equipment in the control system. The present invention is advantageously employable to distinguish failure modes from one another in a wide variety of applications.
In one embodiment of the present invention, the predetermined condition is a loss of power to the first device. Alternatively, the predetermined condition may be receipt of a software command (thereby allowing the device failure signal generator and detector to be tested). Those skilled in the art may readily perceive other conditions under which the device failure signal generator may be prompted to operate.
In one embodiment of the present invention, the device failure signal contains a predetermined data pattern. In an embodiment to be illustrated and described, the predetermined data pattern repeats, allowing the device failure signal detector affirmatively to recognize the pattern and thereby distinguish the pattern from noise. Of course, the device failure signal may be any signal whatsoever, and does not need to carry data.
In one embodiment of the present invention, the device failure signal generator is powered by a power supply that derives power from a power supply of the first device. In the embodiment to be illustrated and described, the first device is contained in a module in a rack centrally-powered by a main power supply and the device failure signal generator is co-located in the module. If the first device is dislodged in the rack such that it loses power, the power supply for the device failure signal generator likewise loses power, but has enough residual hold-up capacity to power the generator for a period of time sufficient to allow the device failure signal detector to receive and recognize the device failure signal.
In one embodiment of the present invention, the inter-device communication occurs over a selected one of: (1) an electrical conductor and (2) an optical fiber. Those skilled in the art will recognize, however, that the device failure signal can be communicated from the first device to the second device over any one of a wide variety of physical media.
The foregoing has outlined rather broadly the features and technical advantages of the present invention so that those skilled in the art may better understand the detailed description of the invention that follows. Additional features and advantages of the invention will be described hereinafter that form the subject of the claims of the invention. Those skilled in the art should appreciate that they may readily use the conception and the specific embodiment disclosed as a basis for modifying or designing other structures for carrying out the same purposes of the present invention. Those skilled in the art should also realize that such equivalent constructions do not depart from the spirit and scope of the invention in its broadest form.