This invention relates generally to data storage systems and methods that rely upon data compression techniques to reduce the amount of physical storage required to store user data, and more particularly to systems and methods that minimize and optimize system processing for retrieving data.
Storage systems have widely relied on data compression techniques to reduce the data footprint so that the same amount of physical storage space can host more user data. Compression is a common technique to reduce physical data storage requirements. At a high level, standard data compressors work on uncompressed data of original length X and output compressed data of a smaller length Y. The difference X−Y is the physical storage space saved by data compression. To get back the original data, standard decompressors decompress every byte in the compressed data of length Y and output the original data of length X. However, in many cases only a fraction of compressed data is actually needed, and decompressing all of the data wastes CPU processing cycles on uncompressing unwanted data. Both data compression and decompression are CPU intensive tasks. Decompressing unneeded data is inefficient and wastes processing resources, and in large storage systems where there are numerous concurrent read requests, this slows data access. In addition, existing decompressors often need two separate buffers, one for the input of compressed data and one for the output of decompressed data. This requires allocating excess memory resources for decompression that are otherwise unneeded, and causes inefficient memory utilization.
Another example where processing and memory inefficiencies exist in decompressing data is in common deduplication storage systems, as used for backup, for example, where duplicate copies of data are eliminated and backup data is packed and compressed into regions and placed on disks. When retrieving a high-generation (high-gen) data backup, data from regions that were produced by low generation data backups are needed because high-gen data backups are de-duplicated against low-gen data backups. Again, often only a fraction of low-gen data is typically required to retrieve the needed high-gen data backup.
Garbage collection processes are used to reclaim storage space by eliminating dead (unneeded) chunks of data will move and reorder data, and are another example where there are processing and memory inefficiencies. Typically, garbage collection only needs to copy forward live data chunks interspersed among dead chunks. Although live and dead data chunks are compressed together, garbage collection is only interested in the live chunks so that decompression of dead chunks is unnecessary. However, decompression processes must usually decompress all data chunks, alive and dead, then copy live chunks to a new location and discard the dead chunks.
It is desirable to provide data decompression systems and processes that avoid the foregoing and other problems associated with processing compressed data by avoiding wasteful CPU processing cycles and inefficient memory usage to reduce CPU processing burden and improve data access. It is particularly desirable to provide systems and processes that enable decompression of a specified range or portion of data in a region of compressed data to avoid wasting processing cycles decompressing unwanted data. It is to these ends that the present invention is directed.