Modern corporate enterprises have large volumes of critical data, such as work documents, emails, financial records, etc., that requires backup and recovery to prevent data loss. During backup procedure, data stored on client workstations and servers is sent to a backup storage. During recovery procedure, backup data is retrieved from the backup storage and reconstructed on client workstations and servers. Since the amount of data that requires backup can be very large, which even for a medium-sized company can be measured in hundreds of terabytes, the backup process can be very resource intensive and time consuming. Furthermore, provided that data backup has to be performed frequently, e.g., daily, semi-weekly, the backup process can be quite onerous on the corporate network.
For efficiency, various backup systems may employ data deduplication. Data deduplication refers to a data compression technique that offers a reduction in the amount of data by eliminating duplicate copies of repeating data (e.g., a same bit sequence or byte sequence). For deduplication, unique blocks of data are identified and stored. As new blocks arrive, the backup system determines if the new block matches a stored block. If the new block does match a stored block, a reference (e.g., pointer) is stored for the new block that indicates the data is available at the stored block. If the new block does not match a stored block, the new block is stored so that further blocks may be compared against it.
In some aspects, a block may be associated with an identifier, and identifiers may be indexed. Different approaches to indexing may be employed to determine whether a new block matches a stored block. However, the process of determining whether a new block matches a stored block incurs overhead (e.g., time, system resources, etc.), in particular when a search is performed on disk instead of in volatile memory.
Therefore, there exists a need to reduce the overhead (e.g., time, system resources, etc.) associated with searching for existing blocks and adding new blocks.