High performance, non-volatile storage class memory subsystems are generally composed of relatively expensive components. As such, it is highly desirable to maximize data storage in such systems using data reduction techniques. Data reduction refers to the techniques of data self-compression and data deduplication to reduce the total amount of information that is written to or read from a backend storage system. Data reduction results in the transformation of user (input) data to a more compact representation that can be stored. The advantages of data reduction include improved storage utilization, increased life (in the context of an all-flash storage system), and application acceleration among other advantages.
Data compression refers to process of looking for redundancy within the same data block and then encoding these repeated sequences in such a manner as to reduce the overall size of the data. Data deduplication refers to the process of matching data sequences across multiple blocks in an effort to find matching sequences even if the individual block has uncompressible data. However, conventional systems perform compression and data deduplication as separate steps within the data reduction process. As such, these conventional systems do not combine them into a single step and hence pay latency and bandwidth penalties.
Furthermore, conventional data reduction solutions take a lot of cycles and power in order to perform the compression functions. In any given application data flow, there is always a high probability that a particular set of data blocks may not exhibit self-compression properties. Typically, at the end of a compression stage, conventional solutions perform a check to ensure that the result is not larger than the original block. Accordingly, this is quite late as the resources have already been utilized in trying to compress the data.