The invention generally relates to a journaling method for write transactions to mass storage, such as an array of disk drives, for example.
A redundant array of inexpensive disks (RAID) (called a “RAID array”) is often selected as the mass storage for a computer system due to the array's ability to preserve data even if one of the disk drives of the array should fail. As an example, in an arrangement called RAID4, data may be stored across three disk drives of the array, with a dedicated drive of the array serving as a parity drive. Due to the inherent redundancy that is presented by this storage technique, the data from any three of the drives may be used to rebuild the data on the remaining drive. In an arrangement known as RAID5, the parity information is not stored on a dedicated disk drive, but rather, the parity information is stored across all drives of the array. Other RAID techniques are commonly used.
The RAID array may be part of a cluster environment, an environment in which two or more file servers share the RAID array. For purposes of ensuring data consistency, only one of these file servers accesses the RAID array at a time. In this manner, when granted the exclusive access to the RAID array, a particular file server may perform the read and write operations necessary to access the RAID array. After the particular file server finishes its access, then another file server may be granted exclusive access to the RAID array. For purposes of establishing a logical-to-physical interface between the file servers and the RAID array, one or more RAID controllers typically are used. As examples of the various possible arrangements, a single RAID controller may be contained in an enclosure that houses the RAID array, or alternatively, each file server may have an internal RAID controller. In the latter case, each file server may have an internal RAID controller card that is plugged into a card connector slot of the file server.
For the case where the file server has an internal RAID controller, the file server is described herein as accessing the RAID array. However, it is understood that in these cases, it is actually the RAID controller card of the server that is accessing the RAID array. Using the term “server” in this context, before a particular server accesses the RAID array, the file server that currently is accessing the RAID array closes all open read and write transactions. Hence, under normal circumstances, whenever a file server is granted access to the RAID array, all data on the shared disk drives of the array are in a consistent state.
As noted above, the RAID array is designed to permit the recovery of the data on one of the disk drives of the array should a drive fail. However, a situation may occur in which a file server that owns the access right to the RAID array fails during its access to the array. For example, one of the servers, while accessing the RAID array, may fail due to a power failure. In response to this failure, the cluster management software (part of the server operating system) on one of the remaining servers of the cluster elects a suitable server to replace the failed server.
However, if the file server fails during a critical point of the access, inconsistency between the user data and parity data that the server has stored in the array during the access may occur. For example, in order for the file server to write a block of user data that is passed to the file server to the RAID array, the server performs five steps: 1. the server reads the old corresponding block of data from the RAID; 2. the server reads the old block of parity data from the RAID array; 3. using the old parity and user data, the server calculates the block of new parity data; 4. the server writes new user data to the RAID array; and 5. the server writes the block of new parity data to the RAID array. Disruption of the file server while the server is writing the new user data or the new parity data may present potential problems later on, for example, when a member disk drive of the array fails and an attempt is made to rebuild user data on the failed drive from the parity information. Thus, the parity inconsistency in this scenario may eventually lead to data corruption.
Thus, there is a continuing need for an arrangement that addresses one or more of the problems that are stated above.