In recent years, along with the development and the spread of computers, various kinds of information are put into digital data. As devices for storing such digital data, storage devices such as a magnetic tape and a magnetic disk have been known. As data to be stored has increased day by day and the amount thereof has become huge, a large-capacity storage system is required. Moreover, it is required to keep reliability while reducing the cost for storage devices. In addition, it is also required that data can be easily retrieved later. As a result, a storage system capable of automatically increasing the storage capacity and the performance thereof and eliminating duplicated storage content to reduce the cost for storage, with high redundancy, is desired.
Under such circumstances, a content address storage system has been developed recently as shown in Patent Document 1. This content address storage system distributedly stores data into a plurality of storage devices, and specifies a storing position where the data is stored based on a unique content address specified corresponding to the content of the data. To be specific, the content address storage system divides predetermined data into a plurality of fragments, adds a fragment as redundant data thereto, and stores these fragments into a plurality of storage devices, respectively.
Later, by designating a content address, it is possible to read data, namely, a fragment, stored in a storing location specified by the content address, and recover predetermined data before the division from a plurality of fragments.
Further, as the content address, a hash value of data, which is generated so as to be unique corresponding to the content of data, is used. As such, in the case of duplicated data, it is possible to acquire data of the same content with reference to the data in the same storing position. Accordingly, it is not necessary to separately store duplicated data, whereby it is possible to eliminate duplicated records and reduce the data capacity.
A storage system having the above-described duplicated record elimination function includes an upper-level file system and a lower-level file system, with the following characteristics:    The upper-level file system divides a written file into a plurality of files internally.    The divided files are written from the upper-level file system to a lower-level file system respectively, and are synchronized with a stable storage device by the lower-level file system.    The lower-level file system does not ensure the writing sequence of the data. As such, if system down occurs in the process of data writing, a part of the data might be dropped.
FIG. 1 shows a state where a file F is divided into two by file division. First, the upper-level file system generates a file 1 (F1) and a file 2 (F2) by dividing the file F into a plurality of units of partial data (F1_1, F2_2, etc.), and also generates an index file Idx which records mapping information of the original written file F and the file 1 (F1) and the file 2 (F2) generated by the division. The index file Idx has mapping information of each of the divided units of partial data (F1_1, F2_2, etc.) as an index entry (I-1, etc.).
The mapping information in the index entry mainly includes the following information:    Information of a corresponding file.    Offset information from the head of the file in the file before the division.    Offset information from the head of the file in the divided file.    Data size information.
As an example in which a file system that divides a file as described above is used, software for data backup has been known. In backup software, backup data is divided into a “data part” and a “marker part” inserted by the backup software, at the upper level of the file system. In general, determination of data deduplication is performed in such a manner that data of a file is sectioned to have a given length (fixed length or variable length) and that units of the sectioned data are compared. As such, if there is a difference of data in one file in a space smaller than the length of the sectioned file, such portions of data are not determined to be the same content data. This means that even if there are portions of data of the same content between the sectioned units of data, if there is a slight difference, both sectioned units of data are stored, whereby deduplication of data to be stored cannot be performed efficiently. Further, in the software for data backup, there is a case where unique information is inserted for each backup such as a backup time, besides the data to be backed up, and such a marker part is obstructive to the deduplication between respective full backups.
Accordingly, as described above, by dividing backup data into a “data part” and a “marker part” at the upper level of a file system, it is possible to improve the effect of deduplication of backup data on the “data part” side. In particular, in the case of acquiring full backups for several generations, as it is expected that duplicated portions are significantly large between respective full backups, it is possible to further improve the deduplication function, whereby the storage region can be reduced with high efficiency.
Patent Document 1: JP 2005-235171 A
However, in such a file system, if system down occurs in the process of data writing, there is a case where each of the divided files becomes an incomplete state, like portions not indicated by reference signs in FIG. 2, for example. Particularly, among the divided files, an index file Idx which records mapping information of the respective files is an important file, and if the content thereof becomes incomplete, data accessing cannot be performed normally.
Accordingly, an object of the present invention is to provide a storage system which solves the above-described problem, that is, a disadvantage that it becomes impossible to perform data accessing normally in a file system.
In order to achieve the above-described object, a storage system, which is an aspect of the present invention, is configured to include
a data dividing means for dividing data, to be written into a given storage device, into a plurality of units of partial data, sorting the units of the partial data into a plurality of classifications according to a predetermined criterion, and for each of the classifications, generating new divided file data by linking the units of the partial data;
an index file generation means for generating, for each of the units of the partial data, an index entry including location information in the data to be written before division of the units of the partial data and location information in the divided file data generated after the division of the units of the partial data, adding test data for error detection to the index entry, and generating index file data by linking a plurality of the index entries;
a data writing means for writing the divided file data generated by the data dividing means, and the index file data generated by the index file generation means, into the storage device; and
a recovery means for detecting an error in the index entries written in the storage device, based on the test data included in each of the index entries, wherein
the recovery means deletes an index entry in which an error is detected and all of subsequent index entries in the index file data stored in the storage device, from the index file data.
Further, a program, which is another aspect of the present invention, is a program for causing an information processing device to realize:
a data dividing means for dividing data, to be written into a given storage device, into a plurality of units of partial data, sorting the units of the partial data into a plurality of classifications according to a predetermined criterion, and for each of the classifications, generating new divided file data by linking the units of the partial data;
an index file generation means for generating, for each of the units of the partial data, an index entry including location information in the data to be written before division of the units of the partial data and location information in the divided file data generated after the division of the units of the partial data, adding test data for error detection to the index entry, and generating index file data by linking a plurality of the index entries;
a data writing means for writing the divided file data generated by the data dividing means, and the index file data generated by the index file generation means, into the storage device; and
a recovery means for detecting an error in the index entries written in the storage device, based on the test data included in each of the index entries, wherein
the recovery means deletes an index entry in which an error is detected and all of subsequent index entries in the index file data stored in the storage device, from the index file data.
Further, an information processing method, which is another aspect of the present invention, is configured to include, in an information processing device:
dividing data, to be written into a given storage device, into a plurality of units of partial data, sorting the units of the partial data into a plurality of classifications according to a predetermined criterion, and for each of the classifications, generating new divided file data by linking the units of the partial data;
generating, for each of the units of the partial data, an index entry including location information in the data to be written before division of the units of the partial data and location information in the divided file data generated after the division of the units of the partial data, adding test data for error detection to the index entry, and generating index file data by linking a plurality of the index entries;
writing the divided file data and the index file data into the storage device; and
detecting an error in the index entries written in the storage device, based on the test data included in each of the index entries, and deleting an index entry in which an error is detected and all of subsequent index entries in the index file data stored in the storage device, from the index file data.
As the present invention is configured as described above, even if data written in a storage device becomes incomplete due to system down or the like, subsequent data accessing can be performed normally.