The growing complexity of computer storage systems infrastructure requires solutions for efficient use and management of resources. Storage virtualization is commonly used in storage systems in order to obtain greater flexibility and enable a more efficient use of the storage system. The use of a virtualized storage system enables to present to the user a logical space for data storage while the storage system itself handles the process of mapping it to the actual physical location. For example, block-based virtualization is based on the idea of logical addresses and enables to store and retrieve data in terms of Logical block addresses (LBAs), which are independent of the actual physical addresses in which the data is stored.
Nowadays, certain virtualized storage systems implement data de-duplication. Data de-duplication is a technique directed for optimizing the efficiency of utilization of available storage space in a storage system. In the de-duplication process, a single copy of a data unit is stored in the physical storage, while duplications of identical data units are eliminated and only a virtual representation of these units is maintained. By storing a single copy of each data unit, de-duplication enables to reduce the required storage space of a physical storage.
Storage systems also commonly include a cache memory used to buffer write requests issued by hosts connected to the storage system, as well as to store read data in order to enable faster data retrieval time in future read requests.
US Patent Application No. US20070864756 discloses a data de-duplication application which uses this principle for de-duplication of redundant data on the primary storage read/write pathway of a virtualized server environment. The de-duplication application identifies redundant data in memory (e.g., RAM, cache memory), storage, or both, and replaces the redundant data with one or more pointers pointing to a single copy of the data. According to US20070864756 the same de-duplication method is applied to both the main storage devices and to the cache memory. The method is based on the examination of the contents of data portions (by generating a single value, such as a hash value) and identifying identical data portions.
However, US20070864756 ignores the substantial difference in the rate of change of data, which exists between physical storage devices and cache memory. The rate of change in the cache memory is much faster than in the physical storage device, to the extent that it hinders a realistic implementation of the suggested method for finding duplicates in the cache memory.
Publications considered to be relevant as background to the presently disclosed subject matter are listed below. Acknowledgement of the references herein is not to be inferred as meaning that these are in any way relevant to the patentability of the presently disclosed subject matter.
US Patent application, Pub No. US2010070715 discloses an apparatus, system, and method for de-duplicating storage cache data. A storage cache partition table has at least one entry associating a specified storage address range with one or more specified storage partitions. A de-duplication module creates an entry in the storage cache partition table wherein the specified storage partitions contain identical data to one another within the specified storage address range thus requiring only one copy of the identical data to be cached in a storage cache. A read module accepts a storage address within a storage partition of a storage subsystem, to locate an entry wherein the specified storage address range contains the storage address, and to determine whether the storage partition is among the one or more specified storage partitions if such an entry is found.