Data deduplication is a technique for eliminating large sections of redundant data. It is used to improve storage utilization. In a nut shell, if a data chunk is already stored in a storage device then a later request to store the same data chunk in the storage device will result in storing a link (or other retrieval information) to the already stored data chunk instead of storing both (identical) data chunks.
Identical data chunks are detected by applying a hash calculation process to provide a hash value (called fingerprint) of a data chunk (received data chunk) that is requested to be written to the storage device and by comparing the hash value of the received data chunk to hash values of data chunks already stored in the storage device.
Data deduplication was traditionally associated with backup storage, with relaxed performance requirements. Today, when using flash memories that have limited lifespan (in terms of program and erase cycles) data deduplication is regarded as an essential process.
Furthermore, flash memories have high throughput and low latency and thus data deduplication must be very quick in order utilize these qualities of flash memories.
Data deduplication can be performed in a post-processing manner in which the entire received data chunk is first stored in a buffer, followed by having its hash value calculated (after it is completely stored in the buffer) and then a comparison is made (between hash values) to determine whether the received data chunk is already stored in the storage device.
FIG. 1 illustrates a prior art system 10 and process for performing post-processing data de-duplication.
The system 10 includes a front end interconnect engine 20, a memory module such as random access memory (RAM) 80, hash engine 30 and central processing unit (CPU) 50. These element are electrically coupled to each other by bus (or any link or network) 40. System 10 is coupled to initiator 90 and back-end flash interface engine 70, wherein the latter is coupled to and flash memory 60.
RAM 80 stores a buffer 81, a completion queue 83 for storing indicators about the completion of writing received data chunks to buffer 81, a received data chunk hash value buffer 84, and a hash lookup table 82 that stores hash values of data chunks that are already stored in flash memory 60.
The back-end flash interface engine 70 provided an interface between the flash memory 60 and system 10.
The hash engine 30 may be included in (implemented by) CPU 50 but may be separate from CPU 50.
Front-end interconnect engine 20 receives packets from initiator 90 which can be (or be connected to) any computer that wishes to write to flash memory 60. Flash memory 60 is a storage device that is connected to back-end flash interface engine 70.
A typical write process is illustrated by various dashed arrows in FIG. 1. It may include:                Receiving by the front end interconnect engine 20 a write command (a request to write a received data chunk to flash memory 60) from an initiator 90.        Allocating buffer 81 in RAM 80.        Sending data from an initiator memory to buffer 81.        Once the entire received data chunk is stored in buffer 81—updating a completion flag in completion queue 83.        Accessing, by hash engine 30, RAM 80 to read the received data chunk from buffer 81 and calculating the hash value of the received data chunk. The accessing might include multiple access iterations.        Storing the hash value of the received data chunk in received data chunk hash buffer 84.        Reading, by CPU 50, the received data chunk hash and trying to find a matching hash value in the hash lookup table 82.        If a match is not found—sending the received data chunk to the back-end flash interface engine 70 and from there to flash memory 60. In addition, storing the received data chunk hash value in the hash lookup table 82.        If a match is found then storing a mapping from a received data chunk logical address to the physical address of the already stored matching data chunk.        
The accessing, by hash engine 30, RAM 80 to read the received data chunk from buffer 81 usually involves multiple access iterations. It adds extra load on RAM 80, increases RAM 80 latency and thereby reduces the throughput of other entities (such as CPU 50) that may request to access RAM 80. For example, lookup of hash fingerprints and mappings (done by CPU with highly random accesses to memory) can be dramatically slowed down.