1. Field of the Invention
The present invention relates to a distributed data storage system, a data distribution method, a split data management apparatus, a host terminal and a data distributing program. More particularly, the present invention relates to a distributed data storage system, a data distribution method, a split data management apparatus, a host terminal and a data distributing program for distributing data among and storing them in a plurality of memory devices.
2. Description of the Related Art
In data distribution methods for dividing data into a plurality of data and distributing them to a plurality of memory devices, there are known methods for dividing data forming a plurality of content-data and storing them in a plurality of memory devices for the purpose of reproducing them by streaming. For example, Patent Document 1 (JP-A-2002-244893) describes a data control method for continuously dividing a series of significant data processed as a data stream among and writing them in a plurality of files (corresponding magnetic disk apparatus). Patent Document 2 (JP-A-09-223049) describes a disk array apparatus for controlling the arrangement of continuous file blocks on a file so as to assign them to separate physical block groups. Patent Document 3 (Japanese Patent No. 3052877) describes a disk array apparatus adapted to decide the locations for assigning split data such that, when the order of specifying virtual addresses is clear, the memory devices of the apparatus are accessed evenly and accesses by a plurality of clients to the group of memory devices is not concentrated to a same memory device.
Additionally, methods for not only dispersing data but also making data redundant have been developed and one of such methods is described in Non-Patent Document 1 (John L. Hennessy, David A. Paterson, “Computer Architecture: A Quantitative Approach”, 3rd Edition, Morgan Kaufmann Pub. 2001, pp. 707). Non-Patent Document 1 discloses a method of dispersing data and making data redundant on a block storage level for RAID1+0 and RAID0+1. Patent Document 4 (Japanese Patent No. 2853624) discloses a method of evenly dispersing a copied data block among and storing them in other memory device in accordance with addresses.
Patent Document 5 (JP-A-11-085604) proposes a method of copying content-data and rearranging the locations of content-data and other sets of content-data before the access load gets to the limit level.
However, while the methods described in Patent Documents 1 through 3 can improve the throughput by dividing a data into a plurality of data and distributing and storing them among a plurality of memory devices, no considerations are taken into account for dealing with failures. Additionally, the method described in Patent Document 2 requires that the corresponding relationship between a logical block and a physical block is defined in advance and hence entails a problem that it is difficult to add a hard disk once a disk array is configured.
On the other hand, while the technique of distributing data on a block by block basis as described in Non-Patent Document 1 provides an advantage of improving the durability (reliability) in failures by making data redundant, it does not necessarily improve the throughput because no considerations are taken into account for efficiently exploiting the benefit of parallel accesses to same content-data since data are not recognized as content-data when they are distributed on a block by block basis.
When, for example, a file system adapted to distribute data on a block by block basis is set up on storages, the file system does not recognize any part of a virtual block device that is assigned to an actual block device. In other words, if content-data (file) is assigned to a plurality of virtual block devices, there is no guarantee that it is assigned to a plurality of physical block devices.
While the method described in Patent Document 4 takes considerations on dealing with failures into account, the capacity efficiency of memory devices is not particularly good because memory devices to be used for normal operations and memory devices to be used in failures are separately provided. Additionally, since data that correspond to a memory device (and hence data same as those stored in the memory device) are stored in some other memory device, there arises a problem that the access performance of the method cannot be maintained in a failure and at the time of recovering from the failure. For example, when a memory device is in a failure, only the memory device storing the same data has to bear the load of accessing the data stored in the failed memory device. When another memory device is added to replace the failed memory device, only the memory device storing the data same as those of the failed memory device has to bear the load of the process of copying the data to the added memory device until the copying process ends. Thus, if the copying process is executed while limiting the throughput, the access performance of the memory device falls for a long time due to the time spent for the copying process.
Furthermore, the method described in Patent Document 4 is accompanied by a problem that it is difficult to add a new hard disk once a disk array is configured because the assignment of data (including copy data) is determined on the basis of addresses.
While the method described in patent Document 5 can improve the throughput after a reallocation by making it correspond to demands, no considerations are taken into account on maintaining the access performance at the time of adding and reallocating copy data.
In short, firstly the known data distribution methods are accompanied by a problem that the demand for improving the reliability and the demand for improving the throughput cannot be met simultaneously without reducing the capacity efficiency. The second problem is that the access performance cannot be maintained in operations other than ordinary operations such as those in failures, those of adding copy data and those of reallocating data. The third problem is that no considerations are taken into account on improving the scalability such as adding new memory devices.