The data de-duplication technology is widely used in the storage field. For example, in a backup system, the data de-duplication technology is used to sequentially read chunks of a specific size in a file to be backed up and to search whether a same chunk exists in previously backed up chunks. If there is a backed up chunk that is the same as a chunk to be backed up, the chunk is not backed up; instead, the file to be backed up references the previously backed up chunk. Only a chunk that is not found is backed up. A file that references the previously backed up chunk is called a data de-duplication file. The backed up chunk in a storage system is called a duplication chunk.
Previously backed up chunks are referenced, and the referenced chunks are backed up at multiple times. Therefore, physical locations of the chunks that are included in the de-duplication file, that is, the chunks that constitute the de-duplication file are normally non-contiguous on a disk. Accordingly, a process of reading the de-duplication file is actually a process of reading data from the disk including data fragments. All chunks may be read from the disk after several disk seeks. A reading speed of the disk is slow and therefore a time overhead used during the reading process increases. To resolve this problem, the prior art provides a method for reading a de-duplication file. An additional storage device is used as a cache in this method, and all backed up duplication chunks are stored in the additional storage device. When a de-duplication file is read, corresponding duplication chunks are read from the additional storage device as long as read chunks reference a backed up chunks.
During implementation of the present invention, the inventor finds that the prior art has the following problem.
Because the number of the backed up duplication chunks is large, capacity of the storage device in use is huge. To ensure the reading speed, performance of the storage device is required to be higher than that of the disk. A commonly used additional storage device is a Solid State Disk (SSD), but a price of the SSD is high. Therefore, hardware costs increase.