This disclosure relates generally to data storage systems and, in particular, to deduplicated data backup services.
As computers, smart phones, tablets, laptops, servers, and other electronic devices increase in performance year to year, the data they generate also increases. Although the costs of storage fall every year, the storage needs of many organizations are growing fast enough that these falling costs are negated by rising storage demands. Deduplication of data offers one possible solution to the problem. Deduplication allows duplicate data (including both files and sub-file structures) to be stored only once, but to be accessed by multiple clients. When a deduplicated backup system receives a file that it has stored before in the past, instead of storing that file again, it merely stores a reference to the file in the client's backup directory. When that client requires the backed up file, the deduplicated backup system uses the reference to locate the raw file data, which is then provided to the client. Deduplication can also be performed for sub-file structures, so that even if the entire file is not identical, portions of the file that are identical to past stored data can be stored as references to previously stored data, rather than duplicating the stored data. Deduplication can reduce the storage requirements for an enterprise or individual significantly. However, deduplication requires storage of not only the raw data in files, but also deduplication entries that track the relationship between files and the deduplicated data.
As the volume of stored data increase, so too does the number of deduplication entries required. The storage needs for the deduplication entries alone can grow to the point where it is no longer practical to keep all deduplication entries in fast storage. As a result some deduplication entries must be stored in more plentiful and cheaper slow storage. Storing the deduplication entries in slow storage causes performance degradation as the access time for entries stored in slow storage is much slower than for entries stored in fast storage.
With current technology fast storage is usually implemented using RAM (Random Access Memory) while slow storage is implemented using hard disk drives. There is an order of magnitude difference in access speeds for RAM versus hard disks. Storage management systems that perform deduplication cannot economically fit all deduplication entries in RAM once the stored data grows to the terabyte range. Although storage technologies may change in the future, there will likely remain the same challenge as storage needs will also increase and it will always be more desirable to keep the deduplication entries in the fastest storage available, although the specific technology used for fast and slow storage may change.