Fault tolerant computer systems have been used for some time in large computer installations to ensure that data processing continues even when otherwise "fatal" errors occur. Such systems can include redundant processors and other components which operate simultaneously and in parallel such that, if one unit fails, the other immediately takes over. It will be apparent that all commands and data which are transmitted to one unit must also be transmitted to the other. Moreover, all responses generated by one unit should correspond with the responses generated by the other unit. When responses differ, an error or fault is indicated and a diagnostic procedure must be run to determine the location of the fault and the appropriate corrective action to be taken, including taking the faulty unit off-line and bringing the other unit on-line. Extensive inter-unit communications has typically been required for redundant systems to ensure that all commands are received and processed in parallel. Such communications impose significant overhead burdens on the units and the efficiency of the total system which increases the cost of the system. Consequently, fault tolerant, redundant systems have been more commonly used for high end, critical applications.
However, with the reduction in the prices of powerful computer technology, fault tolerant systems have become more affordable. For example, in an automated data storage and retrieval library system, such as the 3495 Tape Library Dataserver developed by International Business Machines Corporation, the library manager, which coordinates the functions of all of the individual components in the library, is a conventional personal computer. As such, its cost is a relatively small portion of the total cost of the library and a fault tolerant, redundant library manager has become an economically viable option desired by many customers. However, until now, the requirement for extensive inter-unit communications has not diminished.