The present invention relates to the field of a Redundant Array of Independent Disks (RAID) storage system, and more particularly to upgrading firmware on the disks in the RAID without deactivating the server coupled to the RAID storage system.
As the performance of microprocessor and semiconductor memory technology improves, there is a need for improved data storage systems with comparable performance enhancements. Additionally, in enhancing the performance of data storage systems, there is a need for improved reliability of data stored. In 1988, a paper was published by Patterson, Gibson, Katz, A Case for Redundant Arrays of Independent Disks (RAID), International Conference on Management of Data, pgs. 109-116, June 1988. This paper laid the foundation for the use of redundant arrays of independent disks that would not only improve the data transfer rate and data I/O rate over a comparable single disk access, but would also provide error correction at a lower cost in data storage systems.
RAID may include an array of disks which may be coupled to a network server. The server, e.g., file server, database server, web server, may be configured to receive a stream of requests (Input/Output (I/O) requests) from clients in a network system to read from or write to particular disks in the RAID. The I/O requests may also be issued from an application within the server. The server may comprise a RAID controller which may be a hardware and/or software tool for providing an interface between the server and the array of disks. The server may forward the I/O requests to the RAID controller which may retrieve or store the requested data. Typically, the RAID controller manages the array of disks for storage and retrieval and views the disks of the RAID separately. The disks included in the array may be any type of data storage systems which may be controlled by the RAID controller when grouped in the array.
The RAID controller may typically be configured to access the array of disks as defined by a particular xe2x80x9cRAID level.xe2x80x9d The RAID level may specify how the data is distributed across the disk drives and how error correction is accomplished. In the paper noted above, the authors describe five RAID levels (RAID Level 1-RAID Level 5). Since the publication of the paper, additional RAID levels have been designated.
RAID levels are typically distinguished by the benefits included. Three key benefits which may be included in a RAID level are fault tolerance, data availability and high performance. Fault tolerance may typically be achieved through an error correction method which ensures that information can be reconstructed in the event of a disk failure. Data availability may allow the data array to continue to operate with a failed component. Typically, data availability may be achieved through a method of redundancy. Finally, high performance may typically be achieved by simultaneous access to multiple disk drives which results in faster I/O and data transfer requests.
Error correction may be accomplished, in many RAID levels, by utilizing additional parity data stored with the original data. Parity data may be utilized to recover lost data due to disk failure. Parity data may typically be stored on one or more disks dedicated for error correction only or distributed over all of the disks within an array.
By the method of redundancy, data may be stored in multiple disks of the array. Redundancy is a benefit in that redundant data allows the storage system to continue to operate with a failed component while data is being replaced through the error correction method. Additionally, redundant data is more beneficial than backup data because back-up data is typically outdated when needed whereas redundant data is current when needed.
In many RAID levels, redundancy may be incorporated through data interleaving which distributes the data over all of the data disks in the array. Data interleaving is usually in the form of data xe2x80x9cstripingxe2x80x9d in which data to be stored is broken down into blocks called xe2x80x9cstripe unitsxe2x80x9d which are then distributed across the array of disks. Stripe units are typically predefined as a bit, byte, block or other unit. Stripe units are further broken into a plurality of sectors where all sectors are an equivalent predefined size. A xe2x80x9cstripexe2x80x9d is a group of corresponding stripe units, one stripe unit from each disk in the array. Thus, xe2x80x9cstripe sizexe2x80x9d is equal to the size of a stripe unit times the number of data disks in the array.
In an example, RAID level 5 utilizes data interleaving by striping data across all disks and provides for error correction by distributing parity data across all disks. For each stripe, all stripe units are logically combined with each of the other stripe units to calculate parity for the stripe. Logical combination may be accomplished by an exclusive or (XOR) of the stripe units. For N physical drives, Nxe2x88x921 of the physical drives will receive a stripe unit for the stripe and the Nth physical drive will receive the parity for the stripe. For each stripe, the physical drive receiving the parity data rotates such that all parity data is not contained on a single disk.
Disk arrays may be configured to include logical drives which divide the physical drives in the disk array into logical components. Each logical drive may include a cross section of each of the physical drives and may be assigned a RAID level.
Each disk in the disk array of a RAID may store firmware where firmware may refer to software that may be burned into a memory chip, e.g., Read-Only Memory (ROM), Programmable ROM (PROM), Erasable PROM (EPROM), Electrical EPROM (EEPROM) or into the hard drive. Typically, the firmware stored on the disk in the disk array may be configured to perform functions such as sector re-mapping, monitoring the disk for failures, etc.
In order to update the firmware stored on a disk in the disk array, all activity on that disk must become inactive. If the disk receives a request to read from or write to that disk during the process of upgrading the firmware, the disk may become inoperative either permanently or temporarily. Subsequently, the server coupled to the RAID comprising the disk with the firmware to be updated must be deactivated thereby assuring that the disk does not receive any I/O requests.
However, some servers such as mission critical servers may be required to stay active continuously. It would therefore be desirable to be able to update firmware on the disks in a RAID storage system without deactivating the server coupled to the RAID thereby allowing the server to continuously stay active.
The problems outlined above may at least in part be solved in some embodiments by selecting a disk in a disk array of the RAID storage system to have its firmware updated. The selected disk may enter a degrade mode of operation where the RAID controller coupled to the RAID may prevent requests from reaching the selected disk thereby suppressing activity on the selected disk to allow the firmware to be updated. During the updating of the firmware, any stripes updated may be tracked. Upon completion of the firmware update, the stripe units in the selected disk associated with stripes updated may be rebuilt. In this manner, firmware may be updated on a disk in a RAID storage system without deactivating the server coupled to the RAID storage system thereby allowing the server to continuously stay active.
In one embodiment of the present invention, a method for updating firmware on a disk in a RAID storage system implementing a RAID level one system without deactivating a server coupled to the RAID storage system may comprise the step of selecting a particular physical disk in the RAID storage system to update the firmware in that particular disk. The selected disk as well as the associated logical drives may enter a degrade mode of operation. In the degrade mode of operation, a RAID controller, providing an interface between the server and the RAID storage system, may suppress particular activities, e.g., recovery actions, hot spare kickin, from occurring on the selected disk as well as prevent requests, e.g., read/write requests issued from a client, hot swap queries, from reaching the selected disk. By suppressing particular activities from occurring on the selected disk and preventing requests from reaching the selected disk, activity on the selected disk may become inactive thereby allowing the firmware on the selected disk to be updated.
The firmware on the selected disk may then be updated. During the updating of the firmware, the following may occur.
A determination may be made as to whether the RAID controller received any read requests for the stripe units in the selected disk. If the RAID controller received a request to read data stored in a stripe unit in the selected disk, then the RAID controller may retrieve and transmit the requested data stored in the stripe unit that mirrors the stripe unit containing the requested data.
If the RAID controller did not receive a request to read data stored in the stripe unit in the selected disk, then a determination may be made as to whether the RAID controller received any write requests for the stripe units in the selected disk. Furthermore, upon transmitting the requested data stored in the stripe unit that mirrors the stripe unit containing the requested data, a determination may be made as to whether the RAID controller received any write requests for the stripe units in the selected disk.
If the RAID controller did not receive any write requests for the stripe units in the selected disk, then a determination may be made as to whether the updating of the firmware is complete as discussed further below.
If the RAID controller received any write requests for the stripe units in the selected disk, then the RAID controller may write the updated data in the stripe unit that mirrors the stripe unit containing the data that was updated. A copy of the updated data may be stored for backup purposes.
The stripe associated with the stripe units in the selected disk whose data was changed may be tracked. In one embodiment, the stripes associated with the stripe units in the selected disk whose data was changed may be tracked in a table stored in a non-volatile memory of the RAID controller.
A determination may then be made as to whether the updating of the firmware on the selected disk is completed. If the updating of the firmware is not completed, then a determination may be made as to whether the RAID controller received any read requests for the stripe unit in the selected disk. Upon completion of the firmware being updated on the selected disk, the stripe units in the selected disk associated with any stripes to be updated may be rebuilt as described further below.
If the updating of the firmware is complete, then a determination may be made as to whether any stripes were updated. If there were no stripes updated, then there is no need to rebuild any stripe units in the selected disk.
If there were stripes updated, then a stripe unit associated with a stripe updated may be rebuilt. In one embodiment, the stripe units in the selected disk associated with stripes that have been updated may be rebuilt stripe by stripe updated starting from the top stripe that was updated to the bottom stripe, if any, that was updated. In one embodiment, the stripe unit associated with a stripe updated may be rebuilt by copying data in the stripe unit that mirrors the stripe unit to be rebuilt and storing that data in the stripe unit to be rebuilt.
Each stripe associated with the stripe unit rebuilt may be tracked. A determination may then be made as to whether the RAID controller received a write request for a stripe unit rebuilt in the selected disk prior to completing the rebuilding of all stripe units required to be rebuilt.
If the RAID controller did not receive a write request for a stripe unit rebuilt in the selected disk prior to completing the rebuilding of all stripe units required to be rebuilt, then a determination may be made as to whether another disk in the disk array of the RAID storage system has become inoperative during the process of rebuilding the stripe units in the selected disk required to be rebuilt as discussed further below.
If the RAID controller received a write request for a stripe unit rebuilt in the selected disk prior to completing the rebuilding of all stripe units required to be rebuilt, then the updated data may be written in both the stripe unit rebuilt in the selected disk and in the stripe unit that mirrors the stripe unit rebuilt.
A determination may then be made as to whether another disk in the disk array of the RAID storage system has become inoperative during the process of rebuilding the stripe units in the selected disk required to be rebuilt.
If another disk in the disk array of the RAID storage system does not become inoperative during the process of rebuilding the stripe units in the selected disk required to be rebuilt, then a determination may be made as to whether the rebuilding of the stripe units whose data has changed has been completed as discussed further below.
If another disk in the disk array of the RAID storage system has become inoperative during the process of rebuilding the stripe units in the selected disk required to be rebuilt, then the stripe units in the selected disk required to be rebuilt may be rebuilt using a copy of the updated data stored for backup purposes.
A determination may be made as to whether the rebuilding of the stripe units whose data has changed has been completed. If not, then the next stripe unit required to be rebuilt may be rebuilt. If the rebuilding of the stripe units in the selected disk whose data has changed has been completed, then the method may be terminated.
It is noted that even though the above method describes a method for updating firmware on a disk in the RAID storage system implementing a RAID level one system without deactivating the RAID controller that the principles described in the above method may be applicable to any redundant RAID level. It is further noted that a person of ordinary skill would be capable of applying the principles taught in the above method to any redundant RAID level.
In another embodiment of the present invention, a method for updating firmware on a disk in a RAID storage system implementing a RAID level five system without deactivating a RAID controller coupled to the RAID storage system may comprise the step of selecting a particular physical disk in the RAID storage system implementing a RAID level five system to update the firmware in that particular disk.
The selected disk as well as the associated logical drives may enter a degrade mode of operation. In the degrade mode of operation, the RAID controller may suppress particular activities, e.g., recovery actions, hot spare kickin, from occurring on the selected disk as well as prevent requests, e.g., read/write requests issued from a client, hot swap queries, from reaching the selected disk. By suppressing particular activities from occurring on the selected disk and preventing requests from reaching the selected disk, activity on the selected disk may become inactive thereby allowing the firmware on the selected disk to be updated.
The firmware on the selected disk may be updated. During the updating of the firmware on the selected disk, the following may occur.
A determination may be made as to whether the RAID controller received any read requests for the stripe units in the selected disk. If the RAID controller received a request to read data stored in a stripe unit in the selected disk, then the RAID controller may perform a logical calculation on data located in other stripe units associated with the stripe unit containing the data requested. The resulting data may then be transmitted to the requesting client.
If the RAID controller did not receive a request to read data stored in the stripe unit in the selected disk, then a determination may be made as to whether the RAID controller received any write requests for the stripe units in the selected disk. Furthermore, upon transmitting the requested data, a determination may be made as to whether the RAID controller received any write requests for the stripe units in the selected disk.
If the RAID controller did not receive any write requests for the stripe units in the selected disk, then a determination may be made as to whether the updating of the firmware is complete as discussed further below.
If the RAID controller received any write requests for the stripe units in the selected disk, then the RAID controller may generate a new parity for the stripe associated with the stripe unit containing outdated data. A new parity for the stripe associated with the stripe unit containing outdated data may be generated so that the updated data may replace the outdated data as explained further below.
In one embodiment, a new parity may be generated by performing a logical operation on the data to be written along with the data stored in the other stripe units except the stripe unit storing the parity data to be updated and the stripe unit of the selected disk whose firmware is being updated. The updated parity may then replace the older parity associated with the stripe updated. A copy of the updated data, i.e., the data requested to be written in the stripe units in the selected disk, may be stored for backup purposes.
The stripe associated with the stripe unit in the selected disk whose data was changed may be tracked. In one embodiment, the stripes associated with the stripe units in the selected disk whose data was changed may be tracked in a table stored in a non-volatile memory of the RAID controller.
A determination may be made if the updating of the firmware on the selected disk is completed. If the updating of the firmware is not completed, then a determination may be made as to whether the RAID controller received any read requests for the stripe units in the selected disk. Upon completion of the firmware being updated on the selected disk, the stripe units in the selected disk associated with any stripes to be updated may be rebuilt as described further below.
If the updating of the firmware is complete, then a determination may be made as to whether any stripes were updated. If there were no stripes updated, then there is no need to rebuild any stripe units in the selected disk.
If there were stripes updated, then a stripe unit associated with a stripe updated may be rebuilt. In one embodiment, the stripe units in the selected disk associated with stripes that have been updated may be rebuilt stripe by stripe updated starting from the top stripe that was updated to the bottom stripe, if any, that was updated. In one embodiment, the stripe unit associated with a stripe updated may be rebuilt by performing a logical calculation on data located in other stripe units of the stripe associated with the stripe unit containing the outdated data. The resulting data may then be inserted in the stripe unit to be rebuilt.
A determination may then be made as to whether another disk in the disk array of the RAID storage system has become inoperative during the process of rebuilding the stripe units in the selected disk required to be rebuilt.
If another disk in the disk array of the RAID storage system does not become inoperative during the process of rebuilding the stripe units in the selected disk required to be rebuilt, then a determination may be made as to whether the rebuilding of the stripe units whose data has changed has been completed as discussed further below.
If another disk in the disk array of the RAID storage system has become inoperative during the process of rebuilding the stripe units in the selected disk required to be rebuilt, then the stripe units in the selected disk required to be rebuilt may be rebuilt using a copy of the updated data stored for backup purposes.
A determination may then be made as to whether the rebuilding of the stripe units whose data has changed has been completed. If not, then the next stripe unit required to be rebuilt may be rebuilt. If the rebuilding of the stripe units in the selected disk whose data has changed has been completed, then the method may be terminated.
It is noted that even though the above method describes a method for updating firmware on a disk in the RAID storage system implementing a RAID level five system without deactivating the RAID controller that the principles described in the above method may be applicable to any redundant RAID level. It is further noted that a person of ordinary skill would be capable of applying the principles taught in the above method to any redundant RAID level.
The foregoing has outlined rather broadly the features and technical advantages of one or more embodiments of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention.