Storage and transmission of data often incur significant costs. Compression can reduce these costs by allowing one to reproduce the same or nearly the same data using more succinct descriptions. A compression/decompression system typically includes an upstream encoder and a downstream decoder, the internal algorithm for both of which is generally carefully designed to maximize compression efficiency. To operate, the encoder and/or the decoder typically use knowledge about the special characteristics of the data.
In particular, the compression of data is predicated on the existence of certain predictable or typical statistical characteristics in data, which must be identified by an expert or inferred by an algorithm. This information is commonly called the “source model” because it models the statistical source from which the data is presumed to be drawn. The efficiency with which compression can be performed on data is described by Shannon's information theory on source coding, in which the predictability of various sources is characterized by the “entropy” of the source model. Entropy forms the ultimate lower bound on how much data can be compressed while remaining faithful in reproduction (or with some preset loss as measured by a metric). The source model is generally the key piece of information in known compression systems. Various systems have been designed to exploit the source model of the data either explicitly or implicitly.
In practical systems (e.g., image compression), a compression input is often deconstructed by the upstream encoder according to its assumption of a source model (e.g., a model for natural images), such as by transforming the data into a different domain (e.g., a frequency domain, for images) in which the data is in a “simpler” form for subsequent coding. Such a process often becomes the basis of a compression standard for the type of data in question, defining the parameters within which the encoder operates, the format of the compressed output from the encoder, and the decoder that must parse this output to reconstitute the data in the reverse process. Once widely deployed and legacy compressed data is generated, the process of encoding becomes ossified to a large extent, even as newer understandings about the source model or better coding methods may evolve. This is one reason for the lack of uptake for JPEG2000, a technologically superior successor to JPEG image compression.
Typically, noise is introduced in the information/data sequence (also called data message) when it is transmitted through a medium, e.g., air. The noise can cause symbols (e.g., bits) of the data sequence to change, resulting in the reception of a data sequence that may contain erroneous symbols. Error correction codes can address this problem. In general, error correction codes introduce redundancy, e.g., parity bits, prior to the transmission of a data sequence. To this end, during encoding, some error correction codes employ a so called “generator” matrix. In some systems, a joint source-channel coding model is employed, where the channel coding may provide for error correction by introducing and/or exploiting redundancy in a sequence of data to be transmitted. The joint model is limited to only a few kinds of source models.
When the data sequence is received, one or more of the informational symbols (e.g., data bits) and one or more of the redundant symbols (e.g., parity bits) that were added during encoding can be erroneous. By analyzing the entire received data sequence, any errors in one or more of the informational symbols can be detected and may even be corrected. To this end, during decoding, some error correction codes employ a so called “parity matrix,” which is related to the generator matrix. If there are no errors in the received data sequence, applying the parity matrix to the received data sequence yields a zero scalar or vector value, indicating that there were no errors in the received data sequence.
The error correction codes typically add new symbols to the data sequence to be transmitted and/or stored and, thus, generally perform in a manner opposite to that of compression/decompression systems. Specifically, instead of decreasing the number of symbols to be transmitted and/or stored, error correction codes generally increase the total number of symbols to be transmitted and/or stored.
As described above, the encoders in the known compression/decompression systems typically use knowledge about the special characteristics of the data. However, wherever specialized knowledge is used in the system, the downstream system is generally dependent on the particular choice, which usually reduces flexibility in system design and network transport, and may prevent future improvements without a significant overhaul of the entire system and losing compatibility with already compressed data.