Storage devices commonly implement data operations using deduplication for data storage. Deduplication is a known technique which reduces the storage capacity needed to store data as compared to other techniques. An in-line storage deduplication system is a storage system that deduplicates as data arrives. That is, whenever a block is written with content identical to a block already stored, a new copy of the same content is not made. Instead a reference is made to the existing copy.
In order to do this, the system may use a “logical address to physical address” table and a “block hash to physical address” table. The logical address to physical address table maps the logical addresses that blocks are written to, to the actual physical addresses in the data store where the contents of the block logically at that address are stored. In deduplication, multiple logical addresses may be mapped to the same physical address. For efficiency, the logical address to physical address table often also includes a hash of the block pointed to. Some systems may use multiple logical address to physical address tables, dividing up the logical address space.
The block hash to physical address table enables determining if contents of a block with a given hash are already stored, and if so, where that block is. This table often includes additional information, such as reference counts for the physical address pointed to so as to enable “garbage collection” (i.e., removal of contents no longer being used or no longer pointed to). This table can be sizable (e.g., about 60 GB for 8 TB of storage) and may be held in fast, often completely or partially volatile memory.
Unfortunately, after a restart or crash, it may take a substantial time to reload the block hash to physical address table (e.g., reloading a 60 GB table at 100 MB/s takes about 10 minutes), or to rebuild this table (potentially hours). Forcing the user(s) to wait while the table is reloading or rebuilding is undesirable.