In recent years, as computers have developed and become popular, various kinds of information are put into digital data. As a device for storing such digital data, there is a storage device such as a magnetic tape and a magnetic disk. Because data to be stored has increased day by day and the amount thereof has become huge, a high-capacity storage system is required. Moreover, it is required to keep reliability while reducing the cost for storage devices. In addition, it is required that data can be retrieved later with ease. As a result, such a storage system is desired that is capable of automatically realizing increase of the storage capacity and performance thereof, that eliminates a duplicate of storage to reduce the cost for storage, and that has high redundancy.
Under such circumstances, in recent years, a content address storage system has been developed as shown in Patent Document 1. This content address storage system distributes and stores data into a plurality of storage devices, and specifies a storing position in which the data is stored based on a unique content address specified depending on the content of the data. To be specific, the content address storage system divides predetermined data into a plurality of fragments, adds a fragment as redundant data thereto, and stores these fragments into a plurality of storage devices, respectively.
Later, by designating content addresses, it is possible to retrieve data, that is, fragments stored in storing positions specified by the content addresses and restore the predetermined data before being divided from these fragments.
Further, the content address is generated so as to be unique depending on the content of data. For example, a hash value of data is used. Therefore, in the case of duplicated data, it is possible to acquire data having the same content with reference to data in the same storing position. Accordingly, it is not necessary to separately store duplicated data, and it is possible to eliminate duplicated recording and reduce the data capacity.
Further, in the content address storage system, a tree-type file system is used. In this system, a content address referring to stored data is referred to by a content address positioned in a higher hierarchy, whereby content addresses are stored so as to be tree-structured. Thus, by following reference destinations of a content address from a higher hierarchy to a lower hierarchy, it is possible to access target stored data.
With reference to FIG. 1, an aspect of data storing in the content address storage system will be described. To be specific, a change of a tree structure (a hierarchical structure) of content addresses referring to stored data after the data is stored will be described.
FIG. 1 shows data stored in a data storing means of the content address storage system. Boxes denoted by “ca00” and “ca01” are data blocks, and “ca00” and “ca01” are content addresses representing storing positions specified depending on the content of data.
As a method that a certain data block refers to another data block, there is a method of using a direct address beginning with “ca” and a method of using an indirect address like “#1.” An address correspondence table “ca100” stores content addresses as reference destinations of indirect addresses. For example, a storing destination of “file 1” is “#3,” and an indirect address is used without recording a direct content address “ca30.” This is because, when a direct address is used, a change in a leaf that is the end of the tree structure is propagated to a root. For example, a change of “ca30” changes “ca11,” and moreover, changes “ca10” in a higher hierarchy. However, when an indirect address is used, a content address corresponding to the indirect address can be changed in the address correspondence table, and it is possible to inhibit propagation of the change of the address as described above. A source for following the tree structure is stored as route information, and it is possible to follow lower hierarchies in the tree structure from the route information to stored data.
Next, an operation of storing the tree structure into the content address storage system will be shown, and a problem thereof will be described. In the initial state, a root directory “/” has directories “/dir1” and “/dir2” thereunder. An operation of creating a directory “/dir2/file3” when the directory “/dir1” has directories “/dir1/file1” and “/dir1/file2” thereunder will be described.
Firstly, for creating the directory “/dir2/file3,” a file is “opened” and an entry for “file3” is created in “dir2.” Then, “file3=#5” as the entry for “file3” is stored into the data storing means and an address “ca21-1” is obtained (1-1). Then, this address “ca21-1” is stored as an address referred to by “dir2” (1-2), and an address “ca20-1” referring to a data block “dir2, ca21-1” is obtained. After that, the address is associated with “#2” as an indirect reference address of “dir2,” and registered into the address correspondence table (1-3). Thus, a tree structure of the address data is committed.
Subsequently, it is assumed that a data block “data31” is stored into the data storing means (2-1). Then, an address of “ca51” of the data block is obtained, and the content address is recorded into a data block list “ca50(ca50-n)” of “file3” (2-2). Consequently, an address “ca50-1” of the data block list is obtained and associated with an indirect reference address “#5” of “file3,” and “#5=ca50-1” is recorded in the address correspondence table (2-3). Thus, the state of “file3” is committed.
Subsequently, it is assumed that a data block “data32” is further stored into the data storing means will be described (3-1). Then, a content address “ca52” of the data block is obtained and recorded into the data block list “ca50(ca50-n)” of “file3” (3-2). Consequently, the address of the data block list is changed to “ca50-2” and is associated with the indirect reference address “#5” of “file#,” and “#5=ca50-2” is recorded into the address correspondence table (3-3).    [Patent Document 1] Japanese Unexamined Patent Application Publication No. JP-A 2005-235171
However, in the content address storage system described above, the tree structure changes not only at the time of creation of a directory or a file but also at the time of writing of a file, and the amount thereof may be proportion to a file size. Then, an intermediate state of a content address, which does not need to be stored essentially, is stored into the data storing means. Therefore, a storing region of the content address storage system that does not delete data in preparation for storing of the same data in the future is wastefully used. Moreover, in the content address storage system that takes a longer time than a general file system for obtaining a hash value from data, there is a problem that a writing time delays.