A data amount that should be dealt with IT systems has been rapidly increasing and there has been a demand for storage apparatuses capable of dealing with such rapid increase in the data amount. As an example of such storage apparatuses, there is a technique of a distributed storage system that processes and stores data by distributing the data to a plurality of storage apparatuses. Concerning systems for which high-performance analysis is required for the purpose of, for example, analysis of large-scaled big data, the distributed storage system capable of providing scalable capacity and performance is believed to be an effective solution.
Meanwhile, there is a deduplication technique as a technique to save storage areas of storage apparatuses in order to deal with the rapidly increasing data amount. There is PTL 1 as a technique related to a distributed storage system and deduplication. Regarding PTL 1, when each of distributed servers constituting a distributed network receives a read request from a client to read data stored in another distributed server, it acquires the data from the other server and responds to the client. Moreover, a deduplication management apparatus manages a database of unique IDs for identifying the data such as hash values assigned to the data stored in each distributed server. When the deduplication management apparatus searches the database and finds that there are pieces of data associated with the same unique ID as many as or more than a threshold value, it selects a distributed server from which the data should be deleted, and deletes duplicate data stored in the selected distributed server, thereby performing deduplication.