1. Field of the Invention
The present invention relates in general to improved data storage systems and in particular to an improved method and system for reading stored data from a data storage system. Still more particularly, the present invention relates to an improved method and system for fetching stored data from multiple disks that takes advantage of contiguous groups of data organized on the disks.
2. Description of Related Art
As the performance of microprocessor and semiconductor memory technology increases, there is a need for improved data storage systems with comparable performance enhancements. Additionally, in enhancing the performance of data storage systems, there is a need for improved reliability of data stored. In 1988, a paper was published by Patterson, Gibson, Katz, A Case for Redundant Arrays of Inexpensive Disks (RAID), International Conference on Management of Data, pgs. 109-116, June 1988. This paper laid the foundation for the use of redundant arrays of inexpensive disks that would not only improve the data transfer rate and data I/O rate over a comparable single disk access, but would also provide error correction at a lower cost in data storage systems.
Storage systems for computers require fast access and high reliability. A RAID provides a low cost means for both reducing access times and increasing reliability of data. A set of disks is grouped together in an array and pages of data written to the array are xe2x80x9cstripedxe2x80x9d or written across each of the disks.
Striping consists of writing a sequential data block in pages of equal size, and sequential pages are written to alternate drives, in a round-robin fashion. For example, if a RAID has three disks, three pages will be striped across the first stripe, then the fourth page will start the second stripe, starting again with the first disk. This is known as RAID 0, and does not provide increased reliability through redundancy, but reduces access times by a factor approaching the number of drives in the array. RAID 1 (sometimes called RAID Level 1) is a technique providing xe2x80x9cmirroringxe2x80x9d of data, which increases reliability and reduces access time by increasing availability. In mirroring, pages are written to more than one drive, which makes recovery from data failures possible and also increases the likelihood that the data will be available when requested (as opposed to having to wait for the drive containing the data to be available).
In RAID 1, mirroring and striping are combined. The mirrored data, comprising the second set of pages, are striped on alternate stripes from the primary data. The alternate pages are also shifted one drive in the sequence from the corresponding primary page, so that the alternate page and the mirrored page will never be on the same drive.
The data is typically read from the disks by alternating reads between the drives. This provides faster access to the data on the disk, since the second disk will be available to accept a command to read while the first disk is still busy executing the first read command.
RAID includes an array of disks that are typically accessed by a host, such as a computer system, as a unified storage device. These disks may be magnetic disks, optical disks or other storage devices designed to provide long term storage for data. A RAID controller may be a hardware and/or software tool for providing an interface between the host and the array of disks. Preferably, the RAID controller manages the array of disks for storage and retrieval and can access the disks of the RAID separately. The disks included in the array may be any type of data storage systems which can be controlled by the RAID controller when grouped in the array.
The RAID controller is typically configured to access the array of disks as defined by a particular xe2x80x9cRAID level.xe2x80x9d The RAID level specifies how the data is distributed across the disk drives and how error correction is accomplished. In the paper noted above, the authors describe five RAID levels (RAID Level 1-RAID level 5). Since the publication of the paper, additional RAID levels have been designated.
RAID levels are typically distinguished by the benefits included. Three key benefits which may be included in a RAID level are fault tolerance, data availability and high performance. Fault tolerance is typically achieved through an error correction method that ensures that information can be reconstructed in the event of a disk failure. Data availability allows the data array to continue to operate with a failed component. Typically, data availability is achieved through a method of redundancy. Finally, high performance is typically achieved by simultaneous access to multiple disk drives which results in faster I/O and data transfer requests.
Error correction is implemented in some RAID levels by utilizing additional parity data stored with the original data. Parity data may be utilized to recover lost data due to disk failure. Parity data is typically stored on one or more disks dedicated for error correction only, or distributed over all of the disks within an array.
In a redundant storage scheme, data is stored in multiple disks of the array. Redundancy is a benefit in that redundant data allows the storage system to continue to operate with a failed component while data is being replaced through the error correction method. Additionally, redundant data is more beneficial than stored long-term back-up data because back-up data is typically outdated when needed whereas redundant data is current when needed.
Disk arrays are preferably configured to include logical drives which divide the physical drives in the disk array into logical components which may be viewed by the host as separate drives. Each logical drive includes a cross section of each of the physical drives and is assigned a RAID level. For example, a RAID system may include 10 physical drives in the array, and those 10 drives may contain 20 logical drives.
A host computer requests data from the data storage system. The storage system typically contains a cache, where data that has been recently read from the disk, and sometimes prefetched data is contained. Prefetch data is data that is determined to be likely needed based on prior read requests. In a prefetch system, data that has not been specifically requested, but is calculated to likely be requested is read into a cache. Typically, data requests are divided into read commands where each read command may request a fixed amount of data. Often, read commands request sequential data through a series of read requests from the host computer for sequential portions of the data. Under standard operation, upon receiving a read command, the RAID controller will check its cache for the requested data. If the requested data is available in the cache, a cache hit is issued and the data is supplied to the host computer from the cache. However, if the requested data is not available in the cache, there is a cache miss and a read command is issued to the physical drive to retrieve the requested data into the cache. The command may be a Small Computer System Interface (SCSI) command, for example or other appropriate command for the particular drive interface type.
It would therefore be desirable to devise a method that will improve the performance of storage device arrays. It would further be desirable to devise a computer program product wherein such a method may be performed on a computer system. In addition, it would be desirable to devise a redundant storage device array with improved performance.
It is therefore one object of the present invention to provide an improved data storage device for a computer system.
It is another one object of the present invention to provide a method to improve the access and throughput of a redundant array of direct access storage devices.
It is still another object of the present invention to provide a redundant direct access storage device array with improved access and throughput.
The foregoing objects are achieved in a method for accessing pages in a redundant direct access storage array, wherein access to pages in a multi-page request are grouped into single commands for each contiguous grouping of pages on each drive. The grouping of pages in a request may be performed by examining existing commands to determine if a page is available contiguous with pages that have already been added to an existing command and extending that command if the page is available contiguous, or creating a new command if it is not. A device implementing the invention may be embodied in a control means contained within a RAID 1 storage array. The invention may also be embodied in a computer program product having machine-readable instructions for carrying out the above method.
The above as well as additional objectives, features, and advantages of the present invention will become apparent in the following detailed written description.