1. Field of the Invention
The present invention relates generally to maintaining fingerprint indexes, and in particular to a method and system for scaling the size of a deduplication-based fingerprint index to fit in a memory cache to enable efficient index lookups.
2. Description of the Related Art
Block-level deduplication storage systems often face a fingerprint index scaling problem as the number of data segments and fingerprints stored in the system increases. Deduplication storage systems maintain a fingerprint index to implement the deduplication process. The deduplication storage system receives data segments, or receives data and partitions it into segments, and then the system generates fingerprints for the data segments. The system will search the fingerprint index for a match with the newly generated fingerprint, and if it finds a match, it will discard the new fingerprint and store a reference to the identical data segment (corresponding to the matching fingerprint) already in the deduplication storage system.
Deduplication storage systems often perform hundreds of thousands of lookup and insert operations every second on a fingerprint index. Ideally the fingerprint index will be stored in a fast memory device so that these lookup and insert operations may execute quickly. However, typical fast memory devices, such as volatile random access memory (RAM) devices, are limited on the amount of data they can store. Slower memory devices, such as non-volatile disk storage devices, can store more data but have slower access times. If the fingerprint index is stored in a slow storage device, the lookup and insert operations may create a bottleneck for backup and restore operations and degrade system performance. As a result, the backup window for a client may increase, resulting in the backup taking a much longer period of time. It may also increase the duration of a restore operation, such that a client may have to wait for an unacceptably long period of time to receive a requested data item after making the restoration request.
The performance degradation is not an issue when the fingerprint index can fit into a fast storage device, such as an in-memory cache. However, as the number of fingerprints grows, the fingerprint index may exceed the available storage space in the in-memory cache. One possible solution is to increase the size of the cache, such as by adding another media server or content router to the deduplication storage system. However, this is an expensive solution to the problem, and each time the size of the fingerprint index exceeds the in-memory cache, another media server or content router may need to be added to the deduplication storage system.
Another possible solution is to scale the size of the fingerprint index to fit in the cache. However, current scaling solutions are not able to dynamically adjust to changes in the size of the index. Other current scaling solutions sacrifice deduplication efficiency by keeping only a small portion of the fingerprint index in cache memory. Therefore, what is needed in the art is a way to dynamically adjust the size of the fingerprint index to fit in the cache and to maximize deduplication efficiency by keeping fingerprints likely to be accessed in the cache.
In view of the above, improved methods and mechanisms for efficiently managing fingerprint indexes within a deduplication storage system are desired.