The present invention relates generally to a file management system, and more particularly to a file management system capable of enhancing a reliability and a performance of disk devices (RAID devices), wherein the data are stored in redundancy in files on the plurality of disk devices.
A redundant array of independent disks (RAID) is well known as a method of enhancing the reliability and the performance of the data storage device. The RAID method may include a RAID level 1 method of arranging the data for duplex storage, and a RAID level 5 (or RAID level 4) method by which a plurality (N-pieces) of disk devices are used, the data are arranged in stripe in [Nxe2x88x921] pieces of disk devices among those disk devices, and parity data are stored in one remaining disk device. The RAID level 1 and level 5 methods are each defined as a useful technique for enhancing the reliability and the performance of the data storage device, i.e., the disk device. The RAID level 5 method might, however, induce a deterioration of the performance when a small quantity of data are written at random, although it has a high space efficiency (a low storage cost). On the other hand, the RAID level 1 method, though a high performance is exhibited when the small quantity of data are written at random, has a characteristic of the space efficiency being low. Further, both of these methods have such characteristics as to involve the use of standby disk devices which are normally unused against an occurrence of fault, require much time for re-redundancy after the fault has occurred, and have a difficulty of dynamically adding the disk device.
Examples of the technique adopting the RAID methods described above are disclosed in Japanese Patent Application Laid-Open Publication No.5-197498 (Prior Art 1), U.S. Pat. No. 5,519,844 (Prior Art 2), U.S. Pat. No. 5,708,769 (Prior Art 3), Japanese Patent Application Laid-Open Publication No.8-44503 (Prior Art 4), U.S. Pat. No. 5,696,934 (Prior Art 5), Japanese Patent Application Laid-Open Publication No.8-272548 (Prior Art 6), U.S. Pat. No. 5,542,065 (Prior Art 7), and Japanese Patent Application Laid-Open Publication No.9-146717 (Prior Art 8).
The prior arts 1, 2, 3, 6 and 7 relate to a technology by which at least one single physical disk device is contrived to appear as a plurality of logical storage devices on the side of a host computer, wherein a segmentation into the logical storage devices is static, and besides the user has no alternative but to clearly declare which logical storage device is used. Accordingly, from the user""s side, there is not an essential difference from a case where the plurality of disk devices based on different redundancy methods are used by their being connected to the host computer, except for an aspect of the storage cost.
According to the prior arts 4 and 5, a plurality of logical disk devices taking different types of redundancy methods are constructed based on such a contrivance that the host computer may recognize them as one single logical storage device, the data are transferred between the different logical storage devices by use of information on an accessing frequency etc., thereby automatically determining an optimum redundancy method. According to these prior arts, the system automatically selects the redundancy method, however, the data are developed temporarily at the RAID level 1, and, after a fixed period of time has elapsed, the data of the RAID level 1 are transferred to a region of the RAID level 5, with result that an extra overhead occurs. Furthermore, a block position stored with one file is scattered in the process of the transfer from the RAID level 1 to the RAID level 5, and there might be a large possibility of invalidating performance optimization implemented by the file system, i.e., an effect obtained by storing the same file in the consecutive physical blocks as much as possible.
According to the prior art 8, in the disk device incorporating and controlling a plurality of hard disks, there is prepared beforehand an unused region (partition), whereby the redundancy is recovered by using this unused region even if a disk fault happens. Herein, the reason why the unused region is prepared is that if the region is handed over to the host computer, there might thereafter be no recognition of which block in the region is being used. This unused region must be, however, set free previously.
Accordingly, it is a primary object of the present invention to provide a file management system capable of storing the data with a higher usability (reliability) and a higher performance by structuring files for arranging in redundancy the data on a plurality of disk devices, and utilizing characteristics of a file management program (file system) recognizing a mutual relationship between sets of data stored in the plurality of disk devices.
To accomplish the above object, according to one aspect of the present invention, a file management system comprises a plurality of disk devices, managed in the form of a disk pool, of which at least two disk devices are dynamically selected from the disk pool, for constituting a plurality of files for storing in redundancy any one set of data of user data and meta data for managing how the user data are used, and a file system, constituting a part of an operating system of a host computer, for managing the plurality of disk devices as the disk pool and managing en bloc the files, based on the meta data.
In this construction, the file system, in the case of the file of less than one block, selects two of the plurality of disk devices in the disk pool, and makes the user data stored in the redundancy of a RAID level 1. Further, the file system, in the case of the file of over two blocks, selects three or more of the plurality of disk devices in the disk pool, and makes the user data stored in the redundancy of a RAID level 5. Moreover, the file system makes the meta data stored in predetermined two of the plurality of disk devices in the disk pool in the redundancy of the RAID level 1.
The meta data is stored on a file-basis with an address conversion table containing a disk number of the disk device stored with the user data, and a disk block number corresponding to an intra-disk relative block number.
If a block fault occurs in a target disk device when the file system accessing the file, a disk block group needed for recovering contents of the block with the fault is obtained from the address conversion table, the disk block group is read, the data of the block with the fault are recovered from the data read therefrom, the recovered data are written to a newly-allocated normal block in the same disk device, and the newly-allocated disk block number is reflected in the address conversion table.
If a fault of said disk device occurs, a disk block group needed for recovering contents of the block of said failed disk device is obtained from said address conversion table, the disk block group is read, the data of said failed disk device are recovered from the data read therefrom, the recovered data are written to a newly-allocated normal block in said other disk devices not used for the particular file, and the newly-allocated disk block numbers and the newly-allocated disk number are reflected in said address conversion table.
When restarted after system down, said file system reads sequentially the address conversion table in the meta data, recalculates the parity data with respect to the file in which an open indication flag is set, and writes back the recalculated parity data to the parity block in the file.
The file system caches the user data when writing the data in order to retrain an occurrence of recalculation of the parity data, and delays an allocation of the disk block till the file is closed or the cache becomes full.
According to the present invention, even in the case of the file stored in the same plurality of disk devices, the user is able to select an optimum data redundant arrangement per file in terms of considering a reliability, a performance and a storage cost. On this occasion, the user has no necessity for being ware of which disk device the data are stored in, and the system automatically determines based on a load etc.
Further, according to the present invention, if not specified by the user, with a file category and a file size being keys, the system is capable of automatically selecting on the file-basis the optimum data redundant arrangement.
Still further, according to the present invention, if the file size is changed, the system is capable of automatically changing a data redundant structure.
Yet further, according to the present invention, if a fault occurs in the disk device, the redundancy can be recovered by dynamically acquiring a free area on another disk device among the plurality of disk devices. Hence, there is no necessity for preparing any standby disk devices.