In high reliability computer disk storage systems, there is a desire to have redundancy in all the physical parts which make up a subsystem to reduce the potential for loss of data and down time upon failure of a part. The use of dual disk storage controllers, each having its own memory, provides several major benefits to a disk storage system. For example, (1) a redundancy of storage information is retained to allow for recovery in the case of failure or loss of one controller or its memory; (2) repair of a disabled controller is feasible due to the failover capabilities of the secondary controller; and (3) greater system up time is achieved through the secondary controller being available.
With the desire for more performance out of these redundant subsystems, caching and the use of memory as temporary storage has become commonplace. The means by which these duplicate physical memories are kept in synchronization can be difficult. Some disk systems use a latent (delayed or massive update) process to create this duplication, but that approach tends to add expense, is very complex to manage, reduces performance, and limits the accuracy of recovery from failures. Another approach (the one used in this invention) is to form a real-time mirrored memory process to create and retain accuracy during the process of duplication of data. The use of real time, synchronized, redundant memory (mirrored memory) in dual controllers can improve speed and accuracy in the case of a failover from one controller to the other.
Given that mirrored memory is used as the underlying duplication process, one could further control additional costs required for communication by forming a communication path inside that mirrored memory. However, this use of redundant memory as a communication path brings with it a wealth of problems in making that communication path robust. The wealth of problems to overcome include: (1) how are the redundant memories kept consistent and the communication working in light of hardware anomalies; (2) how are address decoding or data bit problems between the memories resolved; (3) how are messages synchronized between the controllers; (4) how are redundant copies of the memory protected from unintentional corruptions created by the communication process between the controllers; (5) how is the consistency of mirrored or shared memory managed; (6) how is a communication process error or total communication break down managed; and (7) when hardware failures are known, what processes are appropriate and how can such be safely communicated to the other controllers.
Given the foregoing problems, and the potential for total loss of communication between controllers should one controller fail, it is not generally taught in the prior art to use mirrored memory as a communication and control path between controllers in a multiple controller system. Rather, prior art methods and channels of communication include small computer systems interface (SCSI), RS232 links, local area networks (LANs), or the like.
Accordingly, objects of the present invention are to provide a communication and control system for real-time, synchronous, mirrored memory controllers in a dual controller disk storage system and to provide a method for using such mirrored memory as a robust communication path between the controllers.