The present invention relates generally to a data storage system. More particularly, the present invention relates to a reliable and workload-adaptive data storage system controller.
Typically, in computing applications, data storage systems consist of devices such as hard disk drives, floppy drives, tape drives, compact disks and the like. These devices are known as storage devices. With an increase in the amount of computation the amount of data to be stored has increased to a great extent. This has led to an increase in the demand for larger storage capacity in the storage devices. Consequently, production of high capacity storage devices has increased in the past few years. However, large storage capacities demand reliable storage devices with reasonably high data transfer rates. Moreover, the storage capacity of a single storage device cannot be increased beyond a limit. Hence, various data storage system configurations and geometries are commonly used to meet the growing demand for increased storage capacity.
A configuration of the data storage system to meet the growing demand involves the use of multiple smaller storage devices. Such a configuration permits redundancy of stored data. Redundancy ensures data integrity in case of device failures. In many such data storage systems, recovery from common failures can be automated within the data storage system itself using data redundancy and error correcting codes. However, such data redundancy schemes may be an overhead to the data storage system. These data storage systems are typically referred to as Redundant Array of Inexpensive (Independent) Disks (RAID). The 1988 publication by David A. Patterson, et al., from University of California at Berkeley, titled “A Case for Redundant Arrays of Inexpensive Disks (RAID)”, reviews the fundamental concepts of RAID technology.
Patterson's publication defines five “levels” of standard RAID geometries. The simplest array defined in Patterson's publication is a RAID 1 system. This system comprises one or more disks for storing data and an equal number of additional “mirror” disks for storing copies of the data. The other RAID levels, as defined in Patterson's publication, are identified as RAID 2, 3, 4 and 5 systems. These systems segment data into smaller portions for storage across several data disks. In these systems, one or more additional disks are utilized for overhead storage. Examples of overhead storage include storage of error check and parity information. The choice of RAID level depends upon reliability and performance capability required for a storage application. The extent of fault tolerance determines the reliability shown by the storage device. The input/output (I/O) rate of data is a measure of the performance of a storage device.
The various RAID levels are distinguished by their relative performance capabilities as well as their overhead storage requirements. For example, a RAID level 1 mirrored storage system requires more overhead storage than RAID levels 2, 3, 4 and 5 that utilize XOR parity to provide requisite data redundancy. RAID level 1 requires 100% overhead since it duplicates all stored data, while RAID level 5 requires 1/N of the storage capacity used for storing data, where N is the number of storage units, like data disk drives, used in the RAID set.
The RAID levels are configured in the storage system using a controller module. This module forms an interface between the storage application and the disk drives. The controller module shields the storage application from details relating to the organization and the redundancy of data across an array of disk drives. The controller makes the storage system appear as a single disk drive having larger storage capacity. The controller may distribute the data across many smaller drives. Most of the RAID controller systems provide large cache memory structures in order to further improve the performance of the data storage system. The storage application requests blocks of data to be read or written and the RAID controller manipulates the array of disk drives and the cache memory as required.
There exist a number of patents dealing with improvements and modifications in RAID controllers. One such patent is U.S. Pat. No. 6,279,138, titled “System for Changing the Parity Structure of a Raid Array”, assigned to International Business Machines Corporation, Armonk, N.Y. This patent relates to a method for altering the structure of parity groups, e.g., altering the RAID level or number of storage devices included in the RAID array in the event of the failure of the primary controller system when dual controllers are in use.
Another patent dealing with improvements in RAID controllers is U.S. Pat. No. 6,601,138, titled “Apparatus System and Method for N-Way Raid Controller having Improved Performance Fault Tolerance”, assigned to International Business Machines Corporation, Armonk, N.Y. The structure and the method disclosed in this patent permit more than two controllers to work together by working under an underlying message passing protocol, to improve system performance and fault tolerance. However, both RAID controllers work on the same RAID set. Use of multiple RAID sets helps in further improving the system performance.
Attempts have been made to provide adaptive RAID technology for the storage systems. FasFile™ RAID, a product from Seek Systems Inc., uses adaptive RAID technology. FasFile™ uses RAID levels 1 and 5 to optimize speed and conserve disk capacity. Furthermore, attempts have been made to enhance the RAID performance by distributing the data proportionally across various disks connected to a RAID controller. U.S. Pat. No. 6,526,478 titled “Raid LUN Creation using Proportional Disk Mapping”, assigned to LSI Logic Corporation, Milpitas, Calif., provides a method and system for creating logical units in a RAID system. This patent provides an improvement in performance by providing a method for dividing a logical unit number (LUN) into a plurality of segments or stripes that are distributed across various drives under the RAID controller. However, the maximum data transfer rate cannot be more than that of the RAID controller. The LUN is typically a unique identifier that designates a logical volume which can include logical storage to which logical objects and files may be allocated and correspondingly accessed as would be visible to an operating system and/or storage application.
In addition to the RAID technique, a number of other techniques for increasing storage capacity exist in the art. One such technique involves incorporating multiple disk drives in the data storage system. A larger amount of energy is required to operate the system because of multiple disk drives. However, the reliability of the system decreases because of the increased heat generation by the multiple disk drives in the system. Additionally, a limited power supply imposes constraints on the system, whereby all disk drives cannot be powered on simultaneously. A power-constrained system requires the powering up of a least number of drives, thereby further constraining the number of drives in the active RAID set. This problem is examined in the U.S. patent application Ser. No. 10/607,932, titled “Method and Apparatus for Power-Efficient High-Capacity Scalable Storage System” filed on Sep. 12, 2002 assigned to Copan Systems Inc., wherein an optimal power managed RAID scheme is described, which is incorporated by reference as if set forth herein in its entirety.
An assumption made in existing data storage system configurations is a fixed workload profile, such as fixed transaction volume size, fixed target input/output (I/O) rate and so on. Thus, these data storage systems define their data organization statically during the initial storage controller configuration time. This configuration will suffice if the variations in the workload profile are not much as compared to the static configuration. However, if the workload profiles change, such as size of transaction volumes vary or I/O rates differ, then RAID organization has to be redefined. In this case, all the old data needs to be mapped on the new data and disk configuration. Therefore, in a large-scale storage system where a large number of hosts are supported with, possibly, different workload profiles, a single RAID organization is not adequate to meet the performance requirements of all hosts.
Most of the existing techniques for increasing the capacity of storage devices are limited to altering the RAID levels and providing multiple storage controllers. These techniques provide adaptive support for a limited storage capacity. However, they do not provide any support for various system constraints. Most of these techniques are incapable of handling varying workload profiles and system constraints. Moreover, these techniques do not use multiple RAID sets and combination of different RAID levels to provide greater degrees of flexibility in applications that have varying transaction volume size and varying levels of performance.
From the above discussion it is evident that there is a need for a solution for optimizing performance of the data storage system by providing different data organization schemes to handle varying workload profiles. The solution should be able to handle hundreds of drives for providing large-scale storage capacity, while maintaining performance and reliability. Further, there is a need for a data storage system to work under the given system constraints. The data storage system should also distribute data across an array of RAID controllers to enhance the data storage system performance.