1. Field of the Invention
The invention relates to a high-performance disk unit suitable for a disk array and to a storage unit subsystem having a high-performance storage unit and a control unit.
2. Description of Related Art
A disk array or disk unit of the type to which the invention is directed is disclosed by D. Patterson, et al: A Case for Redundant Arrays of Inexpensive Disks (RAID), ACM SIGMOD Conference Proceedings, Chicago, Ill., Jun. 1-3, 1988, pp. 109-116. Specifically, Patterson""s paper discloses technology related to the distribution of data in a disk array.
A disk array is a system for increasing the performance and reliability of a disk system. For achieving high performance in a disk array, a plurality of physically present disk units are used as a single disk unit. For achieving high reliability, on the other hand, when one or more disk units storing data breaks down redundant data is stored in one or more separate disk units so that the data in the broken down disk unit can be recovered.
The read/write unit of a disk unit is generally referred to as a record. Patterson""s paper proposes a number of record distribution methods. In the case of using a disk array, however, the records constituting the read/write units from the viewpoint of the processor and the records actually written to the disk units are sometimes of different length. In this specification, the former will be called the logical record and the latter the physical record. Two of the record distribution methods proposed in Patterson""s paper will now be explained.
In the first distribution method, the logical records, i.e. the records from the viewpoint of the processor side are stored in the disk units as divided into m number (mxe2x89xa61) of physical records. This distribution method will hereinafter be called the divided distribution method. (This distribution method is called RAID 3 in Patterson""s paper.) When divided distribution is used, a single logical record is transferred to/from m number of disk units and, therefore, it is possible to obtain an effect equivalent to that of increasing the apparent data transfer rate by a factor of m.
The method of generating redundant data in divided distribution will now be explained. In divided distribution, n pieces (nxe2x89xa61) of redundant data are generated with respect to the m number of physical records into which the logical record is divided and each piece (of the total of n pieces) is stored in a disk unit as a physical record. Hereafter, the physical record storing the data directly read and written by the processor will be called the data record and the physical record storing the redundant data will be called the parity record. Ordinarily, if there are n number of parity records in a parity group, it is possible to recover the data in the parity group even if errors occur in up to n number of disk units.
In the second method, the logical record constituting a read/write unit from the viewpoint of the processor is stored in a single disk unit as a single physical record, namely as a single data record. This will hereafter be called undivided distribution. (This distribution method is called RAID 4 or RAID 5 in Patterson""s paper.) In this method, the logical record is equivalent to the data record. (Since each physical record is designated to be a data record or a parity record, a physical record and a logical record are not necessarily equivalent. In other words, each logical record is a single physical record but each physical record is not necessarily a single logical record, and may instead be a parity record.) The distinguishable feature of undivided distribution is that each read/write operation can be executed at a single one of the disk units constituting the disk array. (When the divided distribution method is adopted, it is necessary to take over a plurality of disk units for read/write.) Therefore, when undivided distribution is used, it is possible to improve the concurrence of the read/write operation and thus realize enhanced performance. Undivided distribution also involves the generation and storage to disk of n number of parity records from m number of data records. However, differently from divided distribution, in which the set of data records in a parity group forms a single logical record from the viewpoint of the processor, in undivided distribution each data record is an independent logical record from the viewpoint of the processor.
Aside from the foregoing disk array technology, technology involving the use of a disk cache for increasing the speed of the write operation in ordinary disk units has also been disclosed.
Japanese Unexamined Patent Public Disclosures Sho 55-157053 teaches the use of a write-after process for speeding up execution of write requests in a control unit having a disk cache. More specifically, the control unit completes the write process at the stage of having completed writing of the write data received from the processor into the cache. The writing of the data received from the processor and stored in the cache, to the disk unit is done later by the write-after operation executed by the control unit.
Japanese Unexamined Patent Public Disclosure Sho 59-135563 teaches a control unit which speeds up the write process while simultaneously ensuring high reliability.
In Japanese Unexamined Patent Public Disclosure Sho 59-135563, the control unit is provided with a nonvolatile memory in addition to the cache memory and the write data received from the processor is stored in both the cache memory and the nonvolatile memory. For writing of the write data to the disk unit the processor executes a write-after operation. The write-after reliability is thus increased.
Japanese Unexamined Patent Public Disclosure Sho 60-114947 teaches a control unit equipped with a disk cache, which controls a dual write disk unit.
In Japanese Unexamined Patent Public Disclosure Sho 60-114947, the control unit responds to a write request received from the processor by writing the write data received from the processor to one of the disk units and the cache memory. Then, later and asynchronously with the read/write request from the processor, the control unit writes the write data stored in the cache memory to the other disk unit. The control unit""s writing of the write data stored in the cache memory to the disk unit at a later time, asynchronously with the read/write request from the processor, is called the write-after operation.
Japanese Unexamined Patent Public Disclosure Hei 2-37418, the control unit again has a nonvolatile memory in addition to the cache memory and stores the write data received from the processor in the cache memory and the nonvolatile memory. Writing of the write data to the two disk units is executed by the control unit by a write-after operation.
Japanese Unexamined Patent Public Disclosure Hei 3-37746, which relates to a control unit that has a disk cache and executes write-after operations, aims at enabling the write-after operations to be executed with good efficiency and teaches a management data structure for this purpose.
When a disk array is used, the change in the content of the logical record at the time a write request is received from the processor necessitates a change in the content of the parity record as well. As a result, transfer operations occur on the data transfer path between the control unit and the disk unit not only for (a) transferring the updated value of the logical record to be written but also for (b) writing the parity record and (c) providing the information required for generating the updated value of the parity record. Since these transfer operations become necessary when, and only when, a disk array is used, they can be considered to constitute transfer overhead unique to a computer system employing a disk array. The size of the increase in data transfer volume on the data transfer path between the control unit and the disk unit that a write operation entails differs depending on whether divided distribution or undivided distribution is used. This will be explained specifically for each of these two methods.
In the case of divided distribution, since the logical record received from the processor for writing corresponds to the content of all of the data records in the parity group, the parity record can be created from the updated value of the logical record received. As a result, no transfer operation is required for providing the information needed for generating the updated value of the parity record. This means that the data transfer overhead on the data transfer path between the control unit and the disk unit is limited to the write transfer volume for the transfer operation of writing the parity record.
In the case of undivided distribution, on the other hand, for generating the updated value of a parity record, an operation for obtaining one of the following value sets is required as the transfer operation for providing (c) the information needed for generating the updated value of the parity record.
(1) The pre-update value of the logical record generated by the write operation (=data record) and the pre-update value of the parity record.
(2) The values of all other data records in the parity group to which the logical record generated by the write operation (=data record) belongs.
Since the process for obtaining the values indicated in (1) generally entails less overhead, the following explanation will be made assuming that the values indicated in (1) are obtained at the time of the occurrence of a write operation. When the process for obtaining the values indicated in (1) is executed as the transfer operation for providing the information needed for generating the updated value of the parity record, two transfers occur even if only one parity record exists (n=1), one for the pre-update value of the logical record generated by the write operation (=data record) and one for the pre-update value of the parity record. Since, in addition, a transfer operation (a) for the updated value of the logical record to be written and a transfer operation for writing the parity record occur once each, the total number of data transfers between the control unit and the disk unit becomes four. When a disk array is not used, a write operation entails only a single data transfer operation (a) for transferring the updated value of the logical record to be written. The data transfer volume between the control unit and the disk unit when a disk array is operated by the undivided distribution method thus becomes four times the conventional volume.
From the foregoing it can be seen that the adoption of a disk array causes the throughput part of the transfer operation executed directly between the control unit and the processor to be reduced in proportion to the aforesaid transfer overhead.
The object of this invention is to suppress to the minimum possible the transfer overhead occurring between the control unit and the disk unit as a result of parity record handling.
It will now be explained how the invention achieves its object relative to the aforesaid problems.
As a basic capability for minimizing parity record handling related overhead between the control unit and the disk unit, the present invention provides the disk unit with the capability to generate parity records. However, while simply providing the disk unit with parity record generation capability enables a reduction of transfer overhead in undivided distribution it does not enable the transfer overhead to be reduced in divided distribution. The reason for this will now be explained taking as an example the case of only a single parity record, which is the case entailing the least transfer overhead.
As was explained earlier, in the conventional operation using undivided distribution, a data transfer to a write request requires, in addition to writing of the data record, reading of the pre-update values of the data record and the parity record and writing of the parity record, so that the transfer volume becomes four times that in a system not using a disk array. On the other hand, if the disk unit side is provided with parity generation capability, the control unit generates an intermediate value for generating the parity record from the pre-update value and the updated value of the data record and transfers the intermediate value to the disk unit. The intermediate value is generated, for example, from the exclusive-OR of the pre-update value of the data record and the updated value of the data record. The disk unit side generates the updated value of the parity record from the intermediate value for generating the parity record received from the control unit and the pre-update value of the parity record read and obtained from the recording medium and writes it to the recording medium. In the foregoing operations, the transfer operations between the control unit and the disk unit consist of one each for data record reading, data record writing, and transfer of the intermediate value for generating the parity record. The transfer volume can thus be held to three times that before adoption of the disk array.
In the case of divided distribution, however, since the parity record can be generated from the logical record received from the processor for writing, it is most efficient for the control unit to generate and send the parity record. As a result, the parity generation capability on the disk unit side cannot be used effectively.
For suppressing the transfer overhead of the data transfer path between the control unit and the disk unit, the invention further uses, in combination with the parity generation capability provided in the disk unit as the basic capability for this purpose, a capability for broadcast transfer between the control unit and the disk units. In this case, the transfer overhead can be reduced in either divided distribution or undivided distribution. This will now be explained in detail.
The case of divided distribution will be discussed first. In this case, the control unit broadcasts the logical record as it is, without division, to all disk units belonging to the parity group. At this time, the disk units in the parity group receiving the logical record can be classified into disk units that are to store a part of the logical record as a data record and disk units that are to store the parity record corresponding to the logical record. If a disk unit is one which is to store the data record, it extracts from the logical record the part that is to be written therein and writes the same to the recording medium. On the other hand, if the disk is one which is to store the parity record, it generates the parity record from the logical record and writes the same to the recording medium.
In the foregoing arrangement, the control unit transfers only the logical record. It is therefore possible to prevent the occurrence of any transfer overhead on the data transfer path between the control unit and the disk unit when a disk array is adopted.
The case of undivided distribution will now be explained. Again, the control unit broadcasts the logical record as it is, without division, to all disk units belonging to the parity group. In this case, the disk units in the parity group that receive the logical record can be classified into disk units that are to write the logical record (=data record) to their recording media, disk units that are not required to do anything, and disk units that are to store the parity record corresponding to the logical record. The disk units that are to store the parity record first have the disk storing the logical record transfer the pre-update value of the logical record. Next they read the pre-update value of the parity record from their recording media. The updated values of the parity record are generated from the so-obtained pre-update values of the parity record and the logical record and the updated value of the logical record first received, and are then written to the recording medium.
On the other hand, the disks that are to store the logical record (=data record) first send the pre-update value of the logical record (=data record) to the disk units that are to store the parity record. Next, they write the updated value of the logical record received from the control unit to the recording medium. In the foregoing operation, the data transferred between the control unit and the disk via the data transfer path is the pre-update value and the updated value of the logical record. In this case, therefore, the transfer volume on the data transfer path between the control unit and the disk unit can be held to double that before adoption of the disk array.
The effect of the invention will now be explained. The following explanation is made assuming only a single parity record, which is the case entailing the least transfer overhead.
First, an explanation will be given regarding the effect obtained when the basic capability, i.e. the capability to generate parity records provided in the disk unit, is applied to a disk array employing undivided distribution. As was explained earlier, application of the invention to undivided distribution requires transfer of the pre-update value of the logical record (=data record) appointed by the write operation and of the pre-update value of the parity record, bringing to four the total number of data transfer operations required for providing the information needed for generating the updated value of the parity record.
On the other hand, when the disk unit side is provided with parity generation capability, the control unit side generates the intermediate value from the pre-update and updated values of the logical record (=data record) and sends the same to the disk unit. The disk unit side generates the parity record from the intermediate value for generating the parity record received from the control unit and the pre-update value of the parity record read from the recording medium and writes the same to the recording medium. The number of data transfer operations required between the control unit and the disk unit is therefore three: one each for the pre-update value of the data record, the updated value of the data record and the intermediate value of the parity record. Since the number of data transfers required without the application of this invention is four, it is possible to achieve the invention""s object of reducing the transfer overhead of the data transfer path between the control unit and the disk unit.
Next, an explanation will be given regarding the effect obtained when the basic capability, i.e. the capability to generate parity records provided in the disk unit, is applied to a disk array employing divided distribution in combination with the capability for broadcast transfer between the control unit and the disk units. As was explained earlier, unless the foregoing capabilities are provided in the case of divided distribution, the transfer of the parity record becomes an overhead between the control unit and the disk unit. When these capabilities are provided, the control unit broadcasts the logical record to all disk units belonging to the parity group, as it is without division. The disk units receiving the logical record execute the following processes. If the disk unit is one which is to store the data record, it extracts from the logical record the part that is to be written therein and writes this part to the recording medium. On the other hand, if the disk unit is one which is to store the parity record, it generates the parity record from the logical record and stores the parity record on the recording medium. Therefore, since the control unit is required to transfer only the logical record and is not required to transfer the parity record, it is possible to achieve the object of reducing the transfer overhead of the data transfer path between the control unit and the disk units.
Lastly, explanation will be given regarding the effect obtained when the basic capability, i.e. the capability to generate parity records provided in the disk unit, is applied to a disk array employing undivided distribution in combination with the capability for broadcast transfer between the control unit and the disk unit.
In this case again, the control unit broadcasts the logical record to all disk units belonging to the parity group. Upon receiving the updated value of the logical record, the disk units which are to store the parity record first have the disks storing the logical record transfer the pre-update value of the logical record (=data record). Next they read the pre-update value of the parity record from the recording medium. The updated value of the parity record is generated from the so-obtained pre-update values of the parity record and the logical record and the updated value of the logical record first received, and is then stored on the recording medium.
On the other hand, the disks that are to store the logical record (=data record) first send the pre-update value of the logical record (=data record) to the disk units that are to store the parity record. Next, they write the updated value of the logical record received from the control unit to the recording medium. In the foregoing operation, the number of data transfer operations between the control unit is two: one each for the pre-update value of the data record and the updated value of the data record. Since the number of data transfers required in divided distribution without the application of this invention is four, it is possible to achieve the invention""s object of reducing the transfer overhead of the data transfer path between the control unit and the disk unit.