In recent years, as computers have developed and become popular, various kinds of information are put into digital data. As a device for storing such digital data, there is a storage device such as a magnetic tape and a magnetic disk. Because data to be stored has increased day by day and the amount thereof has become huge, a high-capacity storage system is required. Moreover, it is required to keep reliability while reducing the cost for storage devices. In addition, it is required that data can be retrieved later with ease. As a result, such a storage system is desired that is capable of automatically realizing increase of the storage capacity and performance thereof, that eliminates duplicate storage to reduce the cost for storage, and that has high redundancy.
In a recently major storage system, when files arc sequentially written therein, the files are located in order as far as possible on a hard disk that actually stores data. Thus, it is possible to decrease the number of times of seeking of the hard disk at the time of writing and reading, and it is possible to realize a high-speed recording and reproducing process. Moreover, since the same files or related data are written in order, there is no unused storage region between the stored data, and it is possible to effectively use a storage region.
On the other hand, in recent years, a content address storage system has been developed as shown in Patent Document 1 for the purpose of more effectively using a storage region. This content address storage system divides a file into a plurality of blocks and records them on a hard disk. At this moment, the storage system specifies a located position of a data block on the hard disk based on the data content of the data block. To be specific, the storage system generates a hash value specified according to the data content of a data block, and manages a storing position by this hash value. Therefore, by using a sequence of hash values configuring the file, it is possible to retrieve a series of data of the file.
Since the content address storage system manages data by using hash values according to data contents as described above, there is no need to store data blocks of the same content in duplicate, and it is possible to reduce the storage amount. For example, assuming a plurality of files have similar contents, it is highly possible that the files contain data blocks of the same content. In the case of storing these files into the content address storage system, the content address storage system compares the hash values of the data blocks to be stored with the hash values of data blocks having already been stored. In a case that the same hash value exists, it appears that a data block of the same content is already stored. In this case, by referring to and managing a content address specifying the storing position of the data block as that of a data block to be stored, it is possible to limit duplicate storage of the data block. Then, in a case that a data block of the same content is stored, by referring to a content address of a data block having already been stored, it is possible to further limit duplicate storage, and it is possible to realize efficient use of a storage region.
[Patent Document 1] Japanese Unexamined Patent Application Publication No. JP-A 2005-235171
[Patent Document 2] Japanese Patent Publication No. 4146380
However, in a case that a data block to be stored is a duplicate, the content address storage system described above does not store the data block, so that there is a case that data blocks within a file are not located in order. In particular, in a case that data blocks of a small data size are generated in random positions among other files having already been stored, the data blocks within the files may be fragmentized. Then, a problem arises in which it takes much time to seek on a hard disk at the time of writing and retrieving, recording and reproduction of data at high speeds cannot be realized, and the performance decreases.
Further, for the purpose of increasing the performance of recording and reproducing data described above, it can be considered to periodically execute rearrangement of stored data. For example, Patent Document 2 describes relocation of data by a disk array device. However, in the case of executing relocation of such data, processing load on the storage system increases, and the performance of the system still decreases.
On the other hand, in recent years, an SSD (Solid State Drive) is also used as a storage medium. The SSD does not need a seek time, and has an excellent performance of reading at random. Therefore, there is a possibility that the aforementioned problem of decrease of performance can be solved by changing a storage medium from the hard disk to the SSD. However, since the unit cost of the storage capacity of the SSD is considerably expensive when compared with the hard disk. Therefore, there arises a problem in which the cost for storage capacity increases.