Non-volatile data storage devices, such as universal serial bus (USB) flash memory devices or removable storage cards, have allowed for increased portability of data and software applications. In order to efficiently utilize the memory capacity of a non-volatile data storage device, data may be compressed prior to storage.
One technique to compress data is to create a data representation from which redundant portions have been removed. For example, a data set may include a plurality of data elements in a string, and a portion of one or more of the data elements may be identical. A data representation of the data set may be formed by eliminating redundant (identical) data portions of the data set. The representation of the data set can be stored and upon request, such as a read request, the data set can be reconstituted to its original form by replacing the redundant portions that were removed in order to form the representation of the data set.
Compression of a large data set can be a time-intensive process that can consume significant computational bandwidth. In some cases, compression may not be warranted, such as instances in which the data set has only a small amount of redundancy. In such cases, compression would not significantly reduce an overall size that the representation of the data set would occupy in storage, as compared with the uncompressed data set. It would be helpful to be able to predict, in a time efficient and computationally efficient manner prior to performing the compression, whether compression of a data set would significantly reduce the memory space needed for storage.