The provision of redundant systems in areas where uninterrupted operation is required are well known. Such uninterrupted operations may be desired for any combination of economic and safety considerations. Redundant control systems are found in many industrial settings including; chemical plants and utility power systems. Redundant control systems are also employed in real time computer and communications systems as might by used in on-line data processing systems as are used in banking and telephone switching, for example. It is also often the case where critical systems are located in remote, difficult to access or hazardous locations. In such cases the provision of redundancy not only allows for improved operational reliability but allows for system maintenance and software or firmware upgrading to be performed remotely as the operations can be assigned to the system that is not currently being upgraded.
The required level of reliability can often be provided with redundancy in the important components of a system. Limiting the redundant components to those along critical paths reduces the costs associated with providing the required reliability. Controllers are one component that perform critical operations and are therefore often provided with redundancy.
A system of redundant controllers will include a primary controller that is in control at any given time and one or more secondary controllers that are available to take control of the system. It is common in the art to refer to the primary controller as the “active” controller while the secondary or redundant controllers are known as “standby” controllers.
Standby controllers are generally run in one of two modes with respect to the active controller. In the first mode the standby is run in a synchronous manner with respect to the active. The two controllers execute the same programs and have identical status at any given time. This allows for fast seamless switching of control between controllers as the two controllers are executing the same instructions i.e. there is no time required to bring the standby controller to the same point as the active. Minor ‘natural’ asynchronicities are often encountered during normal operation, thereby minimizing the advantages of a synchronous system. Alternatively the active and standby controllers could be run in an asynchronous manner. The standby controller is no longer running in step with the active controller rather it is often run with a certain lag time as compared to the active controller. While this second approach requires more time for the standby to assume operations from the active it is easier to run the controllers asynchronously.
In any system employing redundancy there is a need to determine when an error has occurred in the operation of the active controller and a system to switch control from the active to standby controller when such an error has occurred. The detection of a fault in the operation of the active controller and the switching of control can be performed by either software or hardware based methods. The use of one of these approaches generally reflects the role of the redundant controller within the computer system. Software based methods are often used when the redundant controller is the main processing element of the computer. In these situations the output data of the active controller is compared to that of the standby controller(s). It is also possible to monitor signals from the active controller. This is often performed by hardware based methods where the change of a signal initiates a change in the operation of the active controller.
Software based methods are often focussed on the comparison of output data from the controllers. In one example the controllers are run asynchronously with the data of the lead controller being placed in a buffer. When the lagging controller reaches the same point in its operation as that of the data from the lead controller the two sets of data are compared in a comparator. In another example of software based failure detection and control the controllers are run in a synchronous fashion. Output data from the two controllers is placed on one of two buses and compared to one another. Finally, in cases where there are more than two controllers operating the data of the controllers is compared. If two or more of the controllers are in agreement they are determined to be operating correctly while the one that does not agree is seen to be operating incorrectly.
The second general category of failure detection and switching circuits use hardware based methods where hardware based methods are those that use a control circuit to monitor status signals from the controllers. The status signals from the controllers indicate the state of the controller's operation and determine whether a switchover of active control is required. The control circuits that monitor the controllers often employ some form of control logic and a timer circuit to which the controllers must regularly respond to indicate that it is still active and in control. The above hardware based systems are generally simpler than software based systems and use signals from the controllers being monitored as their input.
Both hardware and software based monitor and control systems are often used is systems in dual processing environments. In such a situation the processors are usually mounted on the same board and they are not designed to be inserted or removed during operation of the system. Thus the monitor and control system does not contemplate these situations. The systems therefore do not reflect the functionality associated with rack based computer systems where redundant controllers may be located on separate boards that are likely to be removed during the operation of the computer.
Therefore, there is a need for a system and method for the operation of dual redundant controllers of a rack based computer systems. The system and method of operation should contemplate the requirement of a stable handover of control with a minimum of downtime as is required in the broadcast industry.