This invention relates generally to computer data storage systems, and more particularly to maintaining volume configuration data.
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever. The following notice applies to the software and data as described below and in the drawing hereto: Copyright xe2x96xa1 1999 Microsoft Corporation, All Rights Reserved.
As computer systems have evolved so has the availability and configuration of data storage devices, such as magnetic or optical disks. For example, these storage devices can be connected to the computer system via a bus, or they can be connected to the computer system via a wired or wireless network. In addition, the storage devices can be separate or co-located in a single cabinet.
A storage volume is a software abstraction of the underlying storage devices and is commonly the smallest self-contained unit of storage exposed by an operating system and administered by a file system. Storage volumes abstract the physical topology of the storage devices and may be a fraction of a disk, a whole disk or even multiple disks that are bound into a contiguous range of logical blocks.
Volumes are constructed from one or more extents, with each extent being a contiguous storage address spaces presented by the underlying storage device. An extent is typically characterized by the size of the address space and a starting offset for the address space from a base of the media. Volume mapping is the process of mapping the contiguous address space presented by the volume onto the usually non-contiguous storage address spaces of the underlying extents. Volume mappings are either implemented on a specialized hardware controller, referred to as a hardware volume provider, or in software by a software volume provider.
Volume mappings may be used to increase the fault tolerance, performance, or capacity characteristics of the underlying storage devices. For example, a technique for improving fault tolerance, known as mirroring or plexing a disk, uses multiple disks. When data is written to one disk the data is also written to a second disk; thus the second disk is a xe2x80x9cmirror imagexe2x80x9d of the first disk. If one disk should fail the other disk is still available for use and has an exact copy of the information on the first disk.
In addition RAID numbers are often used to identify storage volume mappings. A RAID, or Redundant Array of Independent Disks, provides the ability to lose an extent without losing volume data. Access to the volume may be slower or more costly, but is not interrupted by the failure of the underlying extent. RAID1 implements mirroring. RAID3 and above all implement some sort of stripe with parity scheme; the different number indicates the arrangement of the data and check-data (or parity) extents. Striping is a mechanism where data for a file or file system is distributed among several different disks.
Volume providers commonly group logical volumes into what are known as xe2x80x9cdiskpacksxe2x80x9d in order to simplify volume management. The diskpack then is a collection of logical volumes and the underlying disks. Diskpacks provide transitive closure for the volumes contained in the disk pack and may provide group sanity checking to ensure volume configuration correctness.
Two important and related aspects of logical volume management include maintaining establishing the sanity of a volume before it is be exposed to a computer system requesting access to a volume and maintaining volume configuration data.
An exemplary situation in which volume providers need a mechanism to determine the sanity of volume configuration is the occurrence of a hardware failure. For example, when only one of two disks comprising a concatenated volume is operational, the volume provider must indicate to the file system or other data manager that the volume is not capable of handling I/O requests to the areas of the volume that reside on the missing or non-operational disk.
A second example occurs when only one member disk of a mirror set is discovered at system initialization. In this case the volume provider should have a mechanism for determining if the discovered member is stale, i.e. contains data which is out-of-date with respect to that contained on the undiscovered member.
In order to determine sanity of volumes and diskpack, current systems providing logical volume management typically replicate configuration data to either all or to a majority of the disks comprising a diskpack. The configuration data includes such information as the identity of all of the disks comprising the diskpack and a log of volume state changes. The volume provider typically uses the configuration data first to determine that a diskpack is sane prior to attempting to determine the sanity of any volume contained within the group. A common diskpack sanity algorithm is to require that at least a majority of the disks comprising the diskpack are present.
One problem with replicating volume information on each disk of a diskpack is that replication doesn""t scale well. As more disks are added to the diskpack, there is more overhead involved in maintaining and replicating the configuration information, and in insuring that the configuration data between disks in the diskpack is consistent.
A second problem is that the volume and disk configuration is statically enforced through the replication of data throughout the volumes and disks in a diskpack. The configuration is statically enforced because the explicit configuration information is read from the volumes, and is not determined dynamically.
Therefore, there is a need in the art for a system of maintaining volume configuration data that scales well as the number of disks and volumes in a diskpack grows. In addition, there is a need for such a system that allows for the dynamic discovery of the addition of new volumes and new disks to diskpacks as the disks are brought on-line.
The above-mentioned shortcomings, disadvantages and problems are addressed by the present invention, which will be understood by reading and studying the following specification.
In one such system for maintaining volume consistency, a data structure containing an epoch number is placed on each of the extents that comprise a volume. Each time a volume configuration change is made, the epoch number is incremented in all of the currently online extents. When a disk volume is discovered by a logical volume manager, the logical volume manager compares the epoch number on the extents. If the epoch numbers are consistent, the volume is exposed as online and made available to applications. If the epoch numbers are not consistent, then at least one extent contains stale data. One aspect of the system is that the volume may still be exposed even if an extent is stale, if the volume can be reconstructed without any data corruption.
A further aspect of the system is that the epoch number can be reported to a cluster services component. This allows for a wider variety of consistency checking and volume exposure policies. The cluster services component can verify that the epoch number on the extents is truly the latest epoch number, thus providing for increased system reliability.
The volume configuration data management system and methods summarized above, and various other aspects of the system will be described in detail in the next section.
The present invention describes systems, clients, servers, methods, and computer-readable media of varying scope. In addition to the aspects and advantages of the present invention described in this summary, further aspects and advantages of the invention will become apparent by reference to the drawings and by reading the detailed description that follows.