1. Field of the Invention
This invention relates in general to fault tolerant storage systems, and more particularly, to a method and apparatus for establishing and maintaining the status of membership sets regarding redundant copies of data used in mirrored read and write input/output without logging.
2. Description of Related Art
A typical digital computer system includes one or more mass storage subsystems for storing data to be processed. In typical mass storage subsystems, the data is stored on disks. Disks are divided into a plurality of tracks at selected radial distances from the center, and sectors defining particular angular regions across each track, with each track and set of one or more sectors comprising a block in which data is stored.
Since stored data may be unintentionally corrupted or destroyed, systems have been developed that create multiple copies of stored data, usually on separate storage devices, so that if the data on one of the devices or disks is damaged, the data may be recovered from one or more of the remaining copies.
One such application includes distributed processing systems that are made up of intelligent workstations adapted to access central databases at source locations. In many of these systems, a given workstation may require access to a single data object form some source database more than once. This requires much duplication of effort by the systems managing the database and the network. To reduce this duplication of effort and to provide increased fault tolerance, it is desirable to maintain replicas of data objects.
Further, both mirrored disk systems and RAID (Redundant Array of Independent Disks) disk systems have been used to provide fault tolerant disk systems for On-Line Database Transaction Processing (OLTP). In a RAID array, the information at corresponding block locations on several disks is used to create a parity block on another disk. In the event of failure, any one of the disks in a RAID array can be reconstructed from the others in the array. RAID architectures require fewer disks for a specified storage capacity, but mirrored disks generally perform better.
Mirroring is a technique for keeping synchronized copies of data on behalf of data managers or applications. Mirroring increases the availability of data by allowing access to it as long as one copy is available. To provide mirroring within a system component, the system needs to track the set of copies that are current. For example, a Logical Volume Manager (LVM) allows users to select and combine disk space from one or more physical disks to create logical volumes, or virtual disk partitions, which can have greater capacity, increased availability, and higher performance than a single drive. When a logging subsystem is present, appropriate log records written to the log can be used to track which copies are current. However, often logging may not be present to record which copies are current.
While mirrored storage provides several advantages, including increased read performance and better fault tolerance, the use of this technology has normally been confined to high-end systems. This is because it was considered to be expensive both in terms of the extra storage required, and in terms of the processing necessary to implement it. Recently, many companies have begun to sell mirrored storage devices which appear to the system to be simple SCSI devices. This, coupled with trends towards smaller systems and dramatic decreases in the cost of storage devices, has made it practical to provide mirroring in small systems as well as large. The result is a need for a simple mirroring technique that can be efficiently implemented either in the file system, or in the device controller of a SCSI device.
Applications and data managers can increase their access to data by having the system maintain several copies in a synchronized, or mirrored, manner. As stated above, access to the data is provided as long as one copy of it is available. When several disks are used in a mirrored fashion, the disks holding the current data must be determined following a total failure, e.g., a loss of power. In a two disk system, this is only a minor problem since the operator can indicate which disk is current. However, this is more difficult when more disks are introduced, and is unviable when the target environment is the consumer market. In these situations, automatic recovery must be used when possible.
Consequently, a system must differentiate between current and stale copies of the data to access appropriate copies of the data when failures occur. There are several methods in use for managing mirrored storage. Perhaps the simplest is to designate one copy as the primary copy. This method has the advantage of simplicity, but limits availability and necessitates manual intervention in the event of a failure.
The other well-known strategies are to use a quorum algorithm, or to write the mirrored set membership to a log. While using some form of a quorum consensus provides automatic recovery, a quorum collection reduces the availability of the system (requiring at least three drives, for example). Using a log has the advantage of providing excellent availability, but has the disadvantage of the added complexity of maintaining the log. Furthermore, a log is often not available.
It can be seen then that there is a need for an apparatus and method to establish and maintain the status of membership sets that does not rely on logging to represent the set of copies that are active.
It can also be seen that there is a need to update the status information in response to configuration changes to maintain the correct set of current copies.