Today, there exist storage systems that use data compression processes to reduce the size of data files being stored on the system and thereby increase the effective data capacity of the storage system. Such existing storage systems can reduce the physical capacity required to store data within primary, secondary, and/or archive storage. Moreover, these systems may compress files, virtual local disks, logical units and other storage entities. In each case, the storage systems achieve this data size reduction by use of a compression process.
A compression process is typically computer code which, when applied to a storage object to be stored on a storage medium, results in the storage object having a smaller, or compressed, size. A compression process is typically a computationally intense process that analyzes and alters the raw data of the storage object. Compression processes may be “lossy”, meaning that some information that is considered superfluous is cut out during compression, or “lossless”, meaning that all information in the object is retained, while still resulting in a smaller size. Lossy compression is most often used for image, video and audio file compression, and basic lossy compression methodologies include; the removal of inaudible frequency ranges from audio recordings, the reduction of the color spectrum of images by averaging out color transitions, and the comparison of frame transitions in video, where only changes in pixel blocks between frames are saved.
Lossless compression processes are commonly used for the compression of data that is characterized by the fact that missing parts would noticeably degrade the dataset, or render it indecipherable, such as for text files. Methodology for lossless text file compression includes statistical modeling algorithms that track repeated occurrences of data, and refer back to a single copy of the repeated data, rather than saving multiple copies of the same data during compression.
A large number of compression processes exist and different compression processes provide different degrees and speeds of compression. Compression processes can be compared using metrics, with the most common comparison made using a compression ratio (CR); which is an estimate of the expected ratio of uncompressed to compressed file size achieved for a typical file. Although compression ratios can be inexact predictors of the compression that will be achieved for a particular file, they generally show that to achieve higher compression ratios, compression processes that take more computational resources are required, where computational resources may include processing time, or memory. As a result, compression processes are also evaluated based on the data compression and decompression speeds they can achieve, for a given compression ratio. Additionally, other metrics may include the memory demands of a given compression process, which refers to the amount of random access memory (RAM) that is required to process the data as the compression process is running Again, a compression process with a higher compression ratio may generally require more RAM and CPU power to process the data.
Typically existing storage systems use a single compression process, and the algorithm used depends on the decompression data rate required to meet the fastest decompression speeds for acceptable data retrieval. Thus, these existing systems are built around a worst-case use model and this shapes the overall system compression performance. Some other existing systems, such as the systems disclosed in US Patent Application 2010/0058002 A1 entitled System and Method for File System Level Compression Using Compression Group Descriptors and assigned to the assignee hereof, apply different compression techniques based on the type of data file being compressed. As described in this publication, the storage system will recognize the file type, typically by examining the file name extension, and will apply to that file a compression process that has been pre-defined by the storage system engineers, as the appropriate compression process to apply to that file type. Although this type of storage system can achieve greater file compression, the rate of data compression can be costly to system performance and delay read and write operations.
Thus, although existing data storage that use data compression can work well, there exists a need for a more efficient method of using data compression to improve available storage capacity.