1. Field of the Inventive Concept
The present inventive concept relates to a data deduplication method.
2. Background
As performance of computer systems that include a distributed storage system has improved, the scale of data to be processed in the computer system has also increased, and securing a storage space for the data has become problematic. In particular, expanding equipment so as to secure the storage space in a distributed storage system that stores large-scale data is expensive, and thus it would be advantageous to reduce wasted storage space through an efficient operation of given storage space. Accordingly, there has been a need for more efficient data management of large amounts of data that include duplicate data.
Japanese Patent Publication No. 2010-256951 discloses a method that attempts to address this problem by dividing the data into segments, calculating eigenvalues for segments that appear to be similar, and comparing eigenvalues as an indication of the degree of similarity.
However, conventional methods need to be improved. Better methods for identifying and removing duplicate data are needed.