In many environments requiring a highly available system, disk drives need to be shared among two or more machines, such that if one machine fails, one of the others can provide access to the data stored on the shared disk drives. In such environments, it is crucial that the data on the disk drives not be accessed by multiple machines simultaneously, to prevent serious data corruption. The present invention provides for an arbitration that has fewer failure modes than previous systems for shared disk drives.
There are two areas in which the present invention is superior to the prior art in the area of disk ownership arbitration.
Typical prior art for arbitration among N servers for access to a set of disks involves a heartbeat being exchanged between two servers, and when the primary server crashes, the secondary server detects a loss of heartbeat and takes over for the failed primary. There are three weaknesses with this art that are solved by this invention.
First, the prior art typically works only with 2 servers, rather than an arbitrary number of servers.
Second, by using the disk as the communications mechanism, rather than a separate communication path, the present invention reduces the chances that the communication path used for determining which server should access the disk will fail independently from the communication path used for accessing the disk itself.
Third, even with a network partition, it is guaranteed that at most one server is granted access to the arbitrated set of disks, while most prior art has the possibility of assigning two servers access to the set of disks in the case where there is a network partition in the communications network, and both servers are actually still connected to the disks.
Another prior art device (Ubik, by Transarc Corporation [1989]), used a similar voting mechanism, run over an IP network, to elect a synchronization server for a distributed database. However, the Ubik system did not use an array of disk blocks for a communication medium, but instead used network packets exchanged over an IP network.