This invention relates generally to the field of disk storage systems, and more particularly to transforming data between various disk storage data formats.
Modern computer systems can persistently store huge amounts of data on physical disks. It is not unusual for a single disk to store gigabytes of data, and large systems can have hundreds, if not thousands of disks. Users of these systems demand continuous, fault-tolerant access to the data. However, from time to time as systems expand and modernize, it is necessary to transform the data to a different format. This is inevitable, and a problem because most prior art systems require extra disks to store copies of the data during the transformation so that should a fault occur, the data can be recovered. This increases the cost of the system.
There are other problems with large databases. The performance of disk devices is limited by physical constraints, such as the speed at which disks can rotate, and heads can move. Clearly, transforming large amounts of data stored on many disks is a costly and time-consuming process. It is a purpose of the present invention to decrease cost, and improve performance for large-scale data transformations.
Most modern, mid-range to high-end disk storage systems are arranged as redundant arrays of independent disks (RAID). A number of RAID levels are known. RAID-0 xe2x80x9cstripesxe2x80x9d data across the disks. RAID-1 includes sets of N data disks and N mirror disks for storing copies of the data disks. RAID-3 includes sets of N data disks and one parity disk. RAID-4 also includes sets of N+1 disks, however, data transfers are performed in multi-block operations. RAID-5 distributes parity data across all disks in each set of N+1 disks. RAID levels 10, 30, and 50 are hybrid levels that combine features of level 0, with features of levels 1, 3, and 5.
A key feature in all modern RAID controllers is the ability to transform data from one RAID level, e.g., RAID-3, to another RAID level, e.g., RAID-5 or RAID-10, and certainly to RAID levels yet to be defined in the future. This is called RAID level migration. In the past, RAID level transformation was done off-line. This meant that no user data transfers were permitted during the transformation. In other words, users of the system were denied access to stored data, perhaps for hours, while the data was transformed from a starting level to a final level.
Today, RAID systems are the core of most large-scale databases and file systems used worldwide. Users of such systems, local and remote, demand continuous access to the stored data. In a global data processing environment, where access is frequently by the Internet, and can happen at any time, scheduled xe2x80x9cdown-timexe2x80x9d is intolerable.
Therefore, modern RAID controllers allow RAID level migration while users continue to access data. This is know as on-line RAID level migration (ORLM). Various method of accomplishing this task are known. The key attributes of a good ORLM strategy are: the transformation should be totally transparent to the users, i.e., the RAID system is never taken off-line, and the system""s performance does not degrade; and levels of fault-tolerance are maintained during the transformation, in both the starting and final RAID level.
In the prior art, RAID level migration typically requires separate disk space for a temporary storage or xe2x80x9cbackingxe2x80x9d area, usually in the format of the starting RAID level. This area has the same fault tolerance as the minimum fault-tolerance of the starting RAID level. Using the temporary storage area for ORLM has at least two extremely large performance problems.
The first is due to the physical nature of how disk drives are constructed and operate. Disk read/write heads are mounted on arms driven linearly or radially by electrical pulses to stepper motors or voice coils to move to across various tracks. The improvement in xe2x80x9cseekxe2x80x9d time seems to have leveled, and even the fastest disks require about 1 millisecond to move track-to-track, and the average seek time latency is an order of magnitude greater. The constant movement of the heads between the tracks used for the temporary storage area and the tracks used for the user data causes a noticeable degradation in performance.
Second, the data need to be copied twice, first from the starting RAID set to the temporary storage area, and then again from the temporary storage area to the final RAID set. Consequently, such an OLRM strategy is bad, not only is the user subjected to degraded performance, but also the degraded performance can last for hours, if not days.
Therefore, there is a need for an improved on-line RAID level transformation strategy that does not require a temporary storage area so that the performance of the system during the transformation does not degrade, and the amount of time that is required for the transformation is reduced.
A primary objective of the present invention is to provide a method and system for changing the RAID level while allowing user data access, without copying any data to a temporary storage area.
Another objective of the present invention is to perform RAID level migration without causing any reduction in fault tolerance.
Another objective of the present invention is perform RAID level migration while minimizing the performance impact on users who are accessing the array while the RAID level migration takes place.
Another objective of the present invention is to perform RAID level migration in a shorter amount of time than RAID level migration schemes that require copying of data to a temporary storage area.
In accordance with the invention, the data are transformed in a most optimal manner with a single copy operation while user concurrently access the data, without a reduction in fault-tolerance and with less of a performance impact.
More particularly, a fault tolerant method transforms physically contiguous data in-place on a disk by partitioning the physically contiguous data into an empty region physically adjacent to data regions including a first data region and a last data region, the first and last data regions at opposing ends of the physically contiguous data regions.
The physically contiguous data are transformed in an order beginning with the first data region and ending with the last data region. The transforming performs the steps of first locking and reading the first data region, second, transforming the first data region, third, writing and unlocking the transformed first data region to the empty region, and fourth, declaring the first data region as the empty region while declaring the empty region as the first region. The first through fourth steps are repeated for each data region, until completion, to transform the physically contiguous data in-place on the disk.