Storage devices are employed to store data that are accessed by computer systems. Examples of storage devices include volatile and non-volatile memory, floppy drives, hard disk drives, tape drives, optical drives, or another type of storage units. A storage device may be locally attached to an input/output (I/O) channel of a computer. For example, a hard disk drive may be connected to a computer's disk controller.
A storage device may also be accessible over a network. Examples of such a storage device include network attached storage (NAS) and storage area network (SAN) devices. A storage device may be a single stand-alone component or may include a system of storage devices such as in the case of Redundant Array Of Inexpensive Disks (RAID) groups and some Direct Access Storage Devices (DASD).
Generally, disk storage is typically implemented as one or more storage “volumes” (i.e., data volumes) that are formed by physical storage disks and define an overall logical arrangement of the storage space. Each volume is typically associated with its own file system. The storage disks within a volume may be typically organized as one or more groups of RAID. Therefore, a data volume is a logical collection in which the disk belongs.
A spare disk pool has spare disks that may be used as replacement storage disks. When a data volume is destroyed, the disks in the volume are placed in a spare disk pool, leaving the data on those disks intact other than configuration information that is used to identify the disk as part of a volume, and with the configuration information now identifying the disk as a spare disk. Such a disk is said to be an “unzeroed spare”. Unzeroed spare disks may have data on them that comprise part of the data stored on a volume, and these are data that are not currently accessible since the volume was destroyed. A volume can be destroyed due to a number of possible reasons. As an example, a volume can be destroyed if the user unintentionally/accidentally or maliciously uses a command (e.g., “vol destroy” command) to destroy the volume.
When volume is being destroyed, the configuration information on all the disks that was part of that volume is updated to the effect that it is no longer part of a volume and the disk is now a spare disk. The user data is not erased at this point. Only the configuration information is updated on the disks so that the system now recognizes those disks as spare disks.
Hardware or firmware/software problems can also cause volumes to be destroyed. For example, certain firmware downloads onto particular types of disks can cause disk size shrinkage which loses the RAID disk labels for that disk. A RAID disk label is a form of metadata for the volume and is stored in the disk. The label may include various information such as, for example, volume information (e.g., name of the volume, raid tree id (identifier) of the volume, and other volume information), plex boundaries information (e.g., the number of plexes), and RAID Group boundaries information (e.g., the number of plexes). When the RAID label is lost or corrupted on a disk in the volume, the volume is destroyed or left incomplete or partial, and the volume cannot be brought online or intact. A partial volume is one that exists but not all of its data is accessible (due to, for example, missing disks or disk failures).
As another example related to hardware problems if a printed circuit board (PCB) in a disk is replaced by another PCB, and the disk is then inserted back into the data storage system, the storage operating system may not accept the disk. Therefore, a volume recovery process has to be performed to permit assimilation of the disk into the data storage system.
In one previous approach, RAID labels (where volume configuration information is stored) were fairly easy to understand and easy to manually modify in a product known as Data ONTAP™ which is commercially available from NETWORK APPLIANCE, INCORPORATED. The volume configuration information indicates the configuration of the data volume (e.g., number of disks, plexes and other configuration data as described below). When a volume is destroyed for a particular reason, customer support engineers can guide the user through a label editing session and the user can manually change the RAID labels in order to recover the destroyed volume. The label of each individual disk in the destroyed volume is edited, and RAID assimilation is then performed to complete the recovery of the volume. However, this label editing session is time consuming and error prone for the user, and the user is also required to know the volume associated with each disk and the disk mapping.
In a subsequent version of DataONTAP (version 6.2) from NETWORK APPLIANCE, INCORPORATED, the RAID labels were designed for program robustness and error-checking, rather than for ease by editing by hand. While these RAID labels were well suited for their intended purposes, volume recovery by users became very difficult, which may typically lead to frustration for users and for the customer support engineers. The current method to recover a destroyed volume is by use of the “label buildtree” command which is available in the “maintenance mode” in the current DataONTAP product. The label buildtree command accepts manual input for volume configuration, rather than automatically saving and restoring the volume configurations. A user could boot into the maintenance mode, and using documentation on the previous composition of the lost volume, the user can type in a command string that would recover the lost volume. This documentation contains accurate records about the volume prior to the volume being destroyed, and the records may include the identification of the disk in the volume, plex information, and the disk mapping in the RAID groups. But this current method requires that the user have the documentation of the volume's configuration before the data loss, and also requires that the disk names either to have not changed or that any disk name changes are known to the user. Furthermore, the process of typing in all of the numbers and names can be confusing, time consuming, and error prone for a user. This method also assumes that the information entered by the user is correct, as incorrect information may prevent the recovery of the lost volume or result in data corruption of the volume. Furthermore, this method requires booting into the maintenance mode.
Therefore, the current technology is limited in its capabilities and suffers from at least the above constraints and deficiencies.