This application is based on Japanese Patent Application No. 10-127439, filed May 11, 1998, and Japanese Patent Application No. 10-281249, filed Oct. 2, 1998, the contents of which are incorporated herein by reference.
The present invention relates to a disk array apparatus having a plurality of disk drives and, more particularly, to a disk array controller which comprises a cache memory for temporarily storing transfer data between a disk drive and host apparatus, and controls access to the disk drives, and a cache control method applied to the controller.
A disk array apparatus is known as an external storage device, which comprises a plurality of disk drives, achieves high-speed access by parallelly driving the plurality of disk drives, and improves reliability by a redundant arrangement.
A disk array apparatus of this type generates parity data as data correction information for write data transferred from a host apparatus, and writes that data in one of the plurality of disk drives. Hence, even when a failure has occurred in one of the plurality of disk drives, data in the failed disk drive can be restored using the stored parity data and data in the remaining disk drives.
As one data redundancy scheme using parity or the like, RAID (Redundant Arrays of Inexpensive Disks) is known. RAID has various levels depending on the arrangements of disk array apparatuses; levels 3 and 5 are prevalent. RAID of level 3 is called RAID3, and is suitable for sequential access (or jobs that require such access) for large data transfers. On the other hand, RAID of level 5 is called RAID5, and is suitable for random access (or jobs that require such access), i.e., frequent read/write access of small data.
The disk array apparatus normally has a cache memory (disk cache) for temporarily storing transfer data between the disk drive and host apparatus. In such arrangement, when target data is present in the cache memory, the target data can be accessed at high speed from the cache memory without accessing the disk drives (i.e., without mechanically accessing them) irrespective of the RAID levels (RAID3, RAID5, and the like).
In RAID3, update parity is generated by segmenting update data transferred from the host apparatus. By contrast, in RAID5, update parity is generated using update data transferred from the host apparatus, data before update, which is stored in a given area of the disk drive where the update data is to be stored, and parity before update (parity data) stored in a given area of another disk drive corresponding to the storage location of the update data.
In a conventional disk array controller, generating update parity for update data transferred from the host apparatus must be implemented by a firmware program in the controller or the controller itself must be configured as a dedicated hardware apparatus. However, implementation by firmware suffers a problem of limited processing speed, and that by dedicated hardware suffers a problem of a complicated circuit.
Hence, the present applicant has proposed a disk array controller, which can attain a simple arrangement, high-speed processing, and easy control by providing a function of generating parity data using read/write data upon cache memory access to the disk cache means side having a cache memory, in Japanese Patent Application No. 8-234264.
In this disk cache controller, data is distributed and stored on the cache memory premised on RAID3, and a control circuit for the cache memory (cache control means) reads out data from the cache memory and EX-ORs the readout data, thus efficiently generating parity data.
However, when this scheme is applied to RAID5, data and parities before update are widely dispersed in the cache memory, and the cache memory area cannot be efficiently used. The reason for this will be explained below.
In RAID3, since update parity is generated based only on update data upon updating data, all data on the cache memory match those on a disk array (disk drive). For this reason, even when the update data is left on the cache memory, it can be used as read data.
By contrast, in RAID5, data and parity before update must be read out, and EX-ORed with update data upon updating data. For this reason, when the update data, and data and parity before update are allocated on the cache memory by the same. method as in RAID3 that generates update parity based only on update data, only the area (⅓ area) of update data on the cache memory is used as that of cache data (read data) (i.e., the area where a copy of data in the disk drive is stored), and the areas of data and parity before update (⅔ area of the cache memory) are wasted.
In this manner, when the same cache control scheme as that for RAID3 is applied to RAID5, data and parities before update, which cannot be used as cache data, are randomly present on the cache memory, and the limited area of the cache memory cannot be efficiently used.
The present invention has been made in consideration of the above situation, and has as its object to provide a disk array controller, which can efficiently generate update parity on the basis of update data, and data and parity before update using a cache memory upon updating data, and can reduce any wasteful area on the cache memory, which cannot be used as cache data, and a cache control method applied to the controller.
It is another object of the present invention to provide a disk array controller, which can efficiently restore original data on the basis of data in the remaining normal disk drives and parity data using a function of generating update parity from update data, and data and parity before update, when a failure has occurred in one of a plurality of disk drives that form a disk array, and a cache control method applied to the controller.
It is still another object of the present invention to provide a disk array controller which can efficiently restore data without being influenced by the number of disk drives (the number of elements) that form a disk array, and a cache control method applied to the controller.
According to the present invention, a disk array controller which comprises external input/output means for controlling input/output with an external host apparatus, disk drive input/output means which allows connection to a disk drive group including N disk drives which form a disk array for storing data input from the host apparatus, disk cache means having a cache memory which temporarily stores transfer data between the disk drives and host apparatus and is managed in units of blocks, and main control means for controlling the respective means, is characterized in that in order to allow generation of parity data on the basis of data before update and parity data before update corresponding to update data upon transferring the update data from the host apparatus, in addition to a first area in which the update data is written, a second area, in which data before update and parity data before update read from one of the N disk drives are temporarily written with a predetermined positional relationship under the control of the main control means, is assured on the cache memory within a predetermined address range, and the disk cache means comprises an EX-OR circuit for EX-ORing two data bit by bit, and cache control means for, when a specific cache access command appended with a request address which indicates a storage location of the update data in the cache memory is supplied from the main control means or disk drive input/output means upon generating parity data in correspondence with the update data transferred from the host apparatus, sequentially reading out the update data at the storage location in the cache memory, which is indicated by the request address, and data before update and parity data before update at storage locations in the second area, which correspond to the storage location of the update data, and making the EX-OR circuit EX-OR the readout data, so as to generate parity data as an EX-OR of the update data, the data before update, and the parity data before update.
Assuming that 2n blocks form the second area, the blocks of the second area are managed in units of n block pairs, and blocks that form the first area are also managed in units of n block columns in correspondence with the n block pairs. An arbitrary disk drive is assigned to each set of one block column in the first area and one block pair in the second area, and update data corresponding to that drive is written in the blocks in the block column. In addition, data before update of that drive and corresponding parity data before update in another drive are written in the block pair corresponding to the block column. In this way, the write locations of the data before update and parity data before update can be easily computed from the write location of the update data. Especially, when a continuous address range is assigned to the block pair, the difference between the addresses of the write locations of the data before update and parity data before update corresponds to one block size, resulting in a very easy computation.
In this arrangement, update parity can be efficiently generated from the update data, data before update, and parity before update using the cache memory upon updating data. In addition, since the data before update and parity data before update, which are not used as cache data, are written in the second area assured on the cache memory in addition to the first area which is used for writing update data, which can be used as cache data, any wasteful area on the cache memory that cannot be used as cache data can be reduced by fixing the second area within a given address range.
According to the present invention, a third area as an extended area of the second area, which is used together with the second area upon restoring data, is assured on the cache memory in addition to the second area. Also, the main control means or disk drive input/output means has parity generation pre-processing means for, when a failure has occurred in one of the N disk drives and data in that disk drive must be restored from data and parity data in the remaining Nxe2x88x921 disk drives, writing data or parity data of each of the remaining Nxe2x88x921 disk drives in the blocks within the first area on the cache memory, and Nxe2x88x922 blocks within the second and third areas, which have a predetermined positional relationship, and command issuance means for sending to the cache control means a cache access command, which is appended with a request address that has a mode designation field set with information for designating a data restoration mode of various modes including a parity generation mode for parity generation and the data restoration mode for data restoration, and indicates the storage location within the first area. In addition, the cache control means has a sequence processing function. With this function, upon receiving a cache access command appended with a request address which includes a mode designation field that designates the data restoration mode, the cache control means sequentially reads out data or parity data from that location within a block in the first area on the cache memory that is indicated by the request address, and makes the EX-OR circuit EX-OR, thereby generating restored data as an EX-OR of the readout Nxe2x88x922 data and one parity data.
Assuming that nxc3x97m blocks form the third area, the nxc3x97m blocks in the third area are managed in units of n block columns each including m blocks, in correspondence with the n block, pairs in the second area, and a set of one block.column in the first area, one block pair in the second.area,. and one block column in the third area are assigned as a block group for data restoration. Thus, data or parity data in the normal Nxe2x88x921 disk drives of the N disk drives are distributed and written in one block in the block column in the first area, two blocks in the second area, and Nxe2x88x921 blocks of Nxe2x88x924 blocks in the block column in the third area, the write locations in other blocks can be easily calculated from the write location in the block in the first area.
In this arrangement, not only parity generation but also data restoration can be done on a single apparatus (randomly) by changing the value in a given field (mode designation field) of a request address appended to a cache access command which remains the same. In addition, data restoration can be efficiently done in the same procedure as in parity generation by using the function of generating update parity from the update data, data before update, and parity before update. If data used in data restoration written in the first area is replaced by restored data, since that restored data can be used as cache data, high-speed disk access to the restored data can be attained.
According to the present invention, the request address appended to the cache access command has an element number designation field set with the number of elements which represents the number of disk drives that form the disk array, in addition to the mode designation field. When the data restoration mode is designated in the mode designation field, the number N of disk drives that form the current disk array is detected with reference to the element number designation field, and the number of blocks to be read out from the third area, and the number of times of EX-ORing are determined on the basis of the detected number N.
In this arrangement, even when a failure has occurred in disk drives in various disk arrays having different numbers of elements, high-speed data restoration can be achieved. Also, a plurality of disk arrays having different numbers of elements can be used at the same time.
According to the present invention, a disk array controller which comprises external input/output means for controlling input/output with an external host apparatus, disk drive input/output means which allows connection to a disk drive group. that forms a disk array for storing data input from the host apparatus, disk cache means having a cache memory which temporarily stores transfer data between the disk drives and host apparatus and is managed in units of blocks, a standard bus for data transfer to which the external and disk drive input/output means are connected, and main control means for controlling the respective means, is characterized by comprising a plurality of register groups each including three registers which are respectively set with a block address for designating a block in the cache memory where update data transferred from the host apparatus is stored, a block address for, designating a block in the cache memory where data before update read from the disk drive to generate parity data using the update data is stored, and a block address for designating a block in the cache memory where parity data before update is stored, and in that the disk cache means comprises an EX-OR circuit for EX-ORing two data bit by bit, and cache control means for, when a specific cache access command appended with a request address which includes a register designation field for designating one of the plurality of register groups, and an intra-cache address designation field indicating an address in a block of the cache memory is received from the main control means for disk drive input/output means to generate parity data corresponding to the update data transferred from the host apparatus, sequentially reading out update data, data before update, and parity data before update stored at locations designated by the intra-cache address designation field in the request address from blocks in the cache memory indicated by the contents set by the register group which is designated by the register designation field in the request address, and making the EX-OR circuit EX-OR the readout data, so as to generate parity data as an EX-OR of the readout update data, data before update, and parity data before update.
In this arrangement, upon transferring update data from the host apparatus, when that update data is stored at a location within an arbitrary block of the cache memory, the block addresses that designate blocks where data before update and parity data before update corresponding to the update data are stored are set in one of the plurality of register groups. Thus, when the main control means or disk drive input/output means sends a specific cache read command for parity generation to the disk cache means, parity data (update parity) can be generated by sequentially reading out update data, data before update, and parity data before update stored at locations (relative locations in blocks) designated by the intra-cache address designation field of the request address from blocks in the cache memory indicated by the contents set in the register group designated by the register designation command of the request address, and EX-ORing the readout data, and can be output as read data requested by the specific cache read command. When data before update or parity data before update corresponding to the update data is not stored in the cache memory, that data before update or parity data before update can be read from the disk array to an arbitrary block in the cache memory.
As described above, in this arrangement, the cache memory can be used in generating update parity upon transferring update data from the host apparatus, and data before update and parity data before update can be used without being copied on the cache memory, by designating blocks in the cache memory using a register group, update parity can be efficiently generated without any system overhead.
Data generation by means of EX-ORing is done not only when parity data (update parity) is generated from update data, data before update, and parity before update, but also when a failure has occurred in one of the plurality of disk drives which form the disk array and data in that disk drive must be restored. In such case, the number of data used in EX-ORing varies depending on the number of disk drives. In this case, data generated by EX-ORing N data is generally called parity data.
For this purpose, according to the present invention, in order to allow use of the cache memory in generating parity data by EX-ORing a maximum of N data (N is an integer equal to or larger than 4), the controller comprises a plurality of register groups each including N registers, and an element number designation field for designating the number of data used in parity generation is added in the request address. The cache control means generates parity data by selecting predetermined registers, the number of which is designated by the element number designation field in the request address, sequentially reading out data stored at locations designated by the intra-cache address designation field in the request address from blocks in the cache memory indicated by the contents set in the selected registers, and making the EX-OR circuit EX-OR the readout data.
In this arrangement, even when the number of data used in parity generation changes, i.e., when the number of disk drives that form the disk array changes, parity data can be simultaneously generated.
However, in the above arrangement, since the element number designation field must be added in the request address, the number of bits that configure the register designation field decreases by the number of bits of the element number designation field, and the number of registers that can be designated decreases.
Hence, according to the present invention, in order to allow generation of parity data by EX-ORing a maximum of N data without using any element number designation field, a plurality of register groups each of which includes registers, the number of which ranges from 3 (inclusive) to N (inclusive: N is an integer equal to or larger than 4) and is determined by a designation address of a register group, i.e., including various numbers of building registers, are used in place of a plurality of register groups each including N registers. Upon reception of a specific cache read command appended with a request address, which includes a register designation field and intra-cache address designation field, the register group designated by the register designation field in the request address is selected, data stored at locations designated by the intra-cache address designation field in the request address are sequentially read out from blocks in the cache memory, the number of which matches the number of building registers in that register group determined by the value in the register designation field, and which are indicated by the contents set in the registers which form the register group, and the EX-OR circuit is made to EX-OR the readout data, thereby generating parity data.
In this way, since the register designation field in the request address is used not only to designate a register group but also to designate the number of building registers of the register group determined by the value in that designation field, i.e., assigned in advance to the value in that designation field, that is, the number of data used in parity generation, more register groups can be designated by a limited number of bits of the request address, and parity data can be simultaneously generated even when the number of disk drives that form the disk array is freely changed.
According to the present invention, a mode designation field for designating a normal access mode for reading out one data from the cache memory or a parity generation mode for sequentially reading out a plurality of data from the cache memory and generating parity data by EX-ORing the readout data is added in the request address appended to the cache read command. Upon reception of the cache read command, one data in the cache memory designated by the request address or parity data obtained by EX-ORing a plurality of data in the cache memory, is selectively output in accordance with the mode designated by the mode designation field in the request address.
In this arrangement, normal data and parity data obtained by EX-ORing a plurality of data can be easily switched by designation in a specific field (mode designation field) in the request address appended to the cache read command.
If the controller comprises two disk cache means, one disk cache means can read/write data from/to the cache memory while the other disk cache means is generating parity data, thus eliminating cache memory access contention upon parity generation.
When the present invention is applied to a disk array apparatus comprising a plurality of disk arrays, the disk cache means is provided in correspondence with each disk array, thereby preventing other disk arrays from being influenced by the time required for parity generation, and improving the overall system performance. If the number of disk cache means is increased/decreased in correspondence with the relationship between the number of disk arrays and cost, the relationship between system performance and cost can be optimized.
If the building registers of each register group can be written from the standard bus by the main control means, the independence of the disk cache means can be improved. In addition, since the controller is compatible with the standard bus, the present invention can also be applied to versatile systems such as personal computers and the like. In this case, an address space expressed by the address field of the standard bus is partially assigned to the registers, so that these registers can be designated by the address field of the standard bus, i.e., can be seen via the standard bus.
Similarly, if the building registers of each register group can be written from other buses (routes) independent from the standard bus, since the standard bus need not be used to set block addresses in the registers, the use efficiency of the standard bus can be improved, i.e., the system performance can be improved. In this case, the address space used by the main control means can be partially assigned to the registers.
As described in detail above, according to the present invention, since data before update and parity data before update which cannot be used as cache data are written in the second area assured on the cache memory in addition to the first area used for writing update data that can be used as cache data, any wasteful area on the cache memory that cannot be used as cache data can be reduced even when update parity is generated from the update data, data before update, and parity before update using the cache memory upon updating data.
Also, according to the present invention, since the third area as an extended area of the second area, which is used together with the second area upon restoring data, is assured on the cache memory in addition to the second area, data restoration for restoring data from data and parity data in the remaining normal disk drives when a failure has occurred in one of the plurality of disk drives that form the disk array can be efficiently done using the function of generating update parity from the update data, data before update, and parity before update.
Furthermore, according to the present invention, since the element number designation field set with the number of elements indicating the number of disk drives that form the disk array is provided to the request address appended to the cache access command, in addition to the mode designation field, and a sequence process for data restoration is done based on the value in the element number designation field when the data restoration mode is designated by the mode designation field, data restoration can be done at high speed even if a disk drive fails in any of various disk arrays having different numbers of elements.
Moreover, according to the present invention, since the block location in the cache memory where update data is stored upon updating data, and the block locations in the cache memory where data before update and parity before update corresponding to the update data are respectively stored are designated by an arbitrary register group, and parity data is generated by sequentially reading out designated data from blocks in the cache memory indicated by the contents set in the registers designated by the request address appended to a cache read command, and EX-ORing the readout data upon reception of the cache read command for designating parity generation upon updating data, update parity can be efficiently generated from the update data, data before update, and parity before update using the cache memory upon updating data without any system overhead.
In addition, according to the present invention, since the number of data used in EX-ORing (the number of reference data) can be designated by a given field in the request address, even when the number of data used in EX-ORing has changed due to, e.g., a change in the number of disk drives which build the disk array, parity data can be simultaneously generated using the cache memory in correspondence with the number of data.
Additional objects and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objects and advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out hereinafter.