Compressing data is a useful way to increase the effective use of data storage capacity. However, not all compression algorithms are equal. Some compression algorithms work better with certain types of data than others and, in general, compression algorithms trade compression ratio for time/processing effort.
If a storage device naively attempts to compress all data it receives and stores, it will cause unnecessary latency, degradation in throughput, and wasted processor cycles. This is because not all incoming data is easily amenable to compression. Some data is already compressed on the host, either as part of a separate, explicit compression feature, or inherently, as is the case of many multimedia file types. These files are not easily amenable to additional compression, causing throughput degradation and latency. Such degradation may be acceptable if the average compression ratio is good (i.e., there is a substantial reduction in space usage). However, if the average compression ratio is poor, the purpose of the data compression will not be achieved and the only result will be performance loss.
Conventional block storage systems, which store files in logical blocks of the same size, are capable of compressing individual logical blocks to increase storage capacity. However, the storage structure of block systems makes it difficult to improve compression of the stored data based on the file type and format, because it is difficult to locate individual files of a particular type or format in and among the blocks. Each logical block in a block storage system may contain several different files (or parts of several files), or conversely, a single data file may be dispersed among several logical blocks, making it difficult to isolate particular file types for individual and targeted compression with algorithms best suited to compress that particular type of data. In addition, logical blocks do not necessarily contain information regarding the start and end points of each individual data file or information regarding input data format.
Similarly, important metadata about the identity of a file occupying one or more blocks may be absent from that particular block level. Thus, traditional block systems would need to sample the data of any block and perform complex mathematical computations to measure potential compressibility of the block. This causes additional performance losses. In addition, compressing information on a block based drive requires substantial background activities to track the location of each block of a target file, compress and assemble those into new blocks, then update mapping tables to show the new locations of compressed and stored information, which may have been substantially changed in the compressing process.
Key value storage systems differ from conventional block storage systems in that they don't contain logical blocks of the same size, but instead store data as values of variable size in which the value represents a particular piece of data or file. Key value storage uses keys that point to specific values stored within. The key may contain useful information, including a logical address, a hash value derived from the data, the data format, etc., regarding the data stored.
An improved compression drive is needed that takes advantage of the organization of key value storage systems in order to allow for variable compression that improves overall storage compression. For example, a drive is needed that is capable of determining if a good compression ratio is expected prior to compression, avoiding useless compression and performance loss. In addition, a drive is needed that is capable of not just performing compression on host data, but of also determining when and how to best compress this data (if at all) based on the nature of the data, the drive capabilities, and/or end user Quality of Service (QoS) requirements.
The above information disclosed in this Background section is only for enhancement of understanding of the background of the disclosure and therefore it may contain information that does not constitute prior art.