Many modern computing environments use an architecture with master computers accessing data stores on behalf of client computers. The reasons for such an architecture include securing access to the data, coordinating access to the data store, and translating client requests into a format that the data store can comprehend.
A data store is a combination of systems and methods for accessing, writing, modifying, and storing data. One example of a data store is a computer running MySQL. Another example of a data store is a web server such as a computer running Apache. A third example of a data store is a computer running a network file system. A fourth example is a computer with a file system.
Some data stores use a technique called journaling. Journaling can help prevent data store corruption. A journal is a history of the changes that are made to a data store. If the data store fails, such as a hard disk crash or a network file system going offline, then the data that should be in the data store can be quickly recovered from the journal and the failed data store. Journaling, journals, and journal files are all known to those practiced in the arts of computer system administration, computer operating systems, computer file systems, and data recovery.
In high reliability environments, a computing system, such as a compute cluster or database distributed through a group of hosts, must be constantly available. Computers, like all equipment, occasionally fail. Highly reliable systems typically use redundant components to ensure that failure of one component does not cause the entire system to fail. For example, a data store can be stored on more than one computer. If one computer fails, others are available to take its place. A client accessing the data store can be unaware of data store hardware failures because the client access succeeds. RAIDs, standing for redundant array of inexpensive disks, are often used in high reliability environments. A RAID appears externally as a single disk drive, but is actually a group of disk drives acting in concert. Certain kinds of RAID can continue operation while a disk drive fails, is removed, and replaced. A journaled data store that is stored on RAID is unlikely to be corrupted.
Some computing environments contain such enormous numbers of computers that failure and data corruption can occur regularly regardless of journaling and redundant data store components. Recall the master computer accessing the data store on behalf of client computers. The master can also maintain a journal for the data store. Computers occasionally fail or otherwise disappear. “Disappear” means the other computers can not communicate with it. When a master disappears, it can be replaced by a new master. In many cases, an inactive master is already prepared and is immediately available. For example, a shadow master continuously attempts to duplicate the internal state of the master and is always prepared to assume the master's responsibilities. A skeleton master, unlike a shadow master, requires some initialization before it can assume the master's responsibilities.
One way to speed the initialization of a new master is for the old master to maintain a checkpoint file. The checkpoint file is essentially a snapshot of the master's internal state. The skeleton master can load the checkpoint file and thereby duplicate that master's state at the time the snapshot was taken.
The problem is that a new master can assume the duties of the master that disappeared, but the old master can reappear. The old master could have been temporarily unavailable because of computer network problems. The old master could have been unresponsive for a number of reasons. Regardless, the old master is back and there are suddenly two master computers trying to maintain the journal. The journal becomes corrupted and then the data store becomes corrupted.