This invention relates generally to data integrity systems used for backup and archiving comprising multiple groups of physical storage devices, e.g., RAID groups, abstracted as virtual storage devices, and more particularly to a method of migrating data and replacing a physical storage device online in such a data integrity system.
Storage systems used for backup and archive of data have architectures that ensure the long term integrity and recoverability of the data. Typically, the storage systems use groups or arrays of physical storage disks organized as RAID groups where data is striped across different physical disks of a RAID group with parity for fault detection and correction. The RAID groups are virtualized by the file system as logical storage devices. The system architecture may also employ data integrity techniques such as only writing to new data blocks and never overwriting data that changes since this can result in accidentally overwriting existing good data and produce data corruption. As a result the storage system size grows continually as new data is written, ultimately necessitating that the storage capacity of the system be increased. Capacity may be increased either by adding additional storage disks to the system, or by increasing the capacity of existing disks through data compression, e.g., data deduplication, and/or disk hardware upgrades. Moreover, sometimes certain types or models of physical disks exhibit reliability problems and need to be replaced.
Disk arrays comprising a collection of physical disks that form a RAID group are generally housed within an enclosure referred to as a “shelf” and associated with a disk controller. The number of physical disks on a shelf depends, in part, upon the level of RAID used in the storage system and the size of the data blocks striped across the disks. RAID 6, for example, is advantageous because it uses block level striping with double distributed parity, and permits data recovery with the loss of two disks. A RAID 6 array may comprise, for instance, twelve to sixteen disks, including two parity disks, and a shelf may have one or more spare disks that can be swapped in place of a failed disk. In the event of a disk upgrade within a disk group, an entire shelf of disks must be replaced with a new shelf, which requires that all data on the disks of the shelf being replaced be migrated to a new shelf, the file system upgraded for the new shelf, and the old shelf physically removed. Many storage systems map disk groups into a single contiguous linear address space. In order to remove and replace a shelf in such systems requires that the system be shut down, the data be moved to a new shelf and the old shelf be removed, and the file system be rebuilt. This can be a complex error prone task. Furthermore, the data migration to the new shelf must be handled in ways which insure data integrity, i.e., that the data is accurately migrated and that data may be recovered in the event of a crash during migration so that no data loss occurs.
It is desirable to provide systems and methods that address the foregoing and other problems with online replacement of physical storage devices in a virtual storage system, including online data migration and data recovery upon a crash, and it is to these ends that the present invention is directed.