A storage server is a special-purpose processing device used to store and retrieve data on behalf of one or more client devices (“clients”), which may access and/or process the data. A storage server can be used, for example, to provide multiple users with access to shared data or to backup mission critical data.
A storage server may provide different levels of access to data. For example, a file server is an example of a storage server that provides file-level access to data. A file server operates on behalf of one or more clients to store and manage shared files in a set of mass storage devices, such as magnetic or optical storage based disks or tapes. The mass storage devices may be organized into one or more volumes of Redundant Array of Inexpensive Disks (RAID). The data may be organized, managed, and/or accessed as data files. Another example of a storage server may be a device that provides clients with block-level access to stored data, rather than file-level access. The data in such a system may be organized and managed and/or accessed as data blocks, which may include more or less information than a file. Also, a storage server may be able to provide clients with both file-level access and block-level access.
A storage server may have access to multiple mass storage devices, or persistent/non-volatile storage devices, which may be managed based on logical or virtual organization. For example, a storage server may represent a group of storage devices (e.g., hard disks) as a logical aggregate of storage devices. The aggregate may be managed to store data in volumes contained within the aggregates. They may in turn be further logically broken down into plexes containing RAID groups. The RAID groups may have multiple disks. While particular terminology is used herein as a reference point to describe particular organizations and/or functions herein, the terminology shall not be construed as limiting, but rather by way of example. Where particular terminology is referred to (e.g., an aggregate, a plex), these are to be understood as merely examples of data structure abstractions that may be substituted with equivalent or similar data structures that may be referred to by other terms.
Data storage can be organized into multiple layers of abstraction to provide fault tolerance, as individual disks can (and do) fail. The abstraction layers also allow a volume or aggregate to store larger quantities of data than can fit on a single disk. However, having multiple disks in a single volume creates the problem of keeping track of which disks are part of which RAID groups, plexes, and aggregates, especially as disks are added and failed out of the aggregates. A disk or other storage device may have a dedicated area to provide a RAID label and/or other metadata to provide the ability to track the disk and determine which disks belong to which RAID groups, plexes, and aggregates. The process of determining the logical data structure to which a disk belongs may be referred to as “RAID assimilation.” Details regarding RAID Assimilation and RAID labels may be found in U.S. patent application Ser. No. 10/105,872, filed on Mar. 20, 2002, and entitled “RAID Assimilation Method and Apparatus,” by Steven Rodrigues and Dave Hitz.
Sometimes a RAID label or metadata in a disk defining the disk's logical data structure associations can become corrupted (e.g., out of date, having unexpected values, having incorrect data, etc.), causing the disk to be treated as either a “broken disk” and/or causing the disk to be lost from the aggregate with which it is associated. If a disk is determined to be physically sound, the disk label may be declared “out of date,” and the disk may be turned into a “hot spare” in the system. If either the disk is physically damaged or the disk is physically sound but its label information is corrupted, the disk may not be available to its associated RAID group (e.g., the RAID group that contained the disk). If a RAID group loses a threshold number of disks, the RAID group and its associated plex are rendered inoperable. In certain cases, if the label data is not just out-of-date but has unexpected values, the disk's aggregate may be marked as “failed assimilation,” meaning the aggregate is unable to be assimilated into the system for availability for serving data. In this case the aggregate may be taken out of service. In all of these scenarios, the data stored on the aggregate is unavailable to users.
Recovering this data often requires users to modify the RAID labels of affected disk(s). Traditionally, modification of RAID labels involved the use of several system management commands to examine and modify the data in the RAID labels of individual and multiple disks. However, to run these commands, the system was traditionally required to be in a mode where it neither assumed that the RAID labels of disks were in any particular state, nor attempted to modify the state of the RAID labels on disks. Such an operating mode may be referred to as a maintenance mode, in which the system does not persistently recognize or act on RAID groups, plexes, aggregates, or volumes. The system may recognize or act on such structures in maintenance mode if a management command needs to operate on one of these structures or entities. If a command specifies operation(s) on one of these entities, the code implementing that command may run assimilation code to determine the makeup of RAID groups, plexes, and aggregates, perform the operation(s), and then reverse the assimilation. Reversing the assimilation causes the system to cease to recognize the RAID groups, plexes, and aggregates in between commands. For example, if a data access command were received for data on a disk in maintenance, the system would not know to what RAID group the disk belongs. This allows service personnel to examine the state of the RAID labels, modify the RAID labels, examine the new state of the system, and continue modifying the RAID labels until the system is again operable.
From the foregoing it will be understood that in maintenance mode user access to data stored in volumes on the storage server is curtailed. Thus, running the system in maintenance mode reduces data availability. Further, because placing the system into maintenance mode and restoring the system from maintenance mode typically have required rebooting the system, there may be a significant performance cost associated with the transitions related to use of a maintenance mode. Note that a system including multiple aggregates may have some aggregates that are operable and could be serving data, except for the bad luck of being on a system that had to be placed into maintenance mode to deal with an aggregate with one or more faulty disks. From a practical standpoint, attempts to recover lost data volumes are typically performed at off-hours because of the cost of making the unrelated volumes unavailable for access, resulting in costly overtime for service personnel and extending the data unavailability period for the down aggregates.