Data compression is basically a process by which data, which has been processed from raw data, is reduced in size. Compression, as its name implies, is the processing of an input source and conversion into another, smaller digital representation of the original. The compressed data then must be reconstructed. If the source and the reconstructed contents match exactly, the method is said to be lossless, otherwise it is considered lossy. Lossy compression allows for a construction of an approximation of the original data and usually obtains higher compression ratios than lossless schemes, however at the expense of quality.
Lossy compression is used mainly for most video, audio, and images. For these types of files, lossless compression is not used efficiently because the data is already in a space saving format output from the original device (ex., a camera). Lossy compression is therefore used on these types of files if a user wants more space saving. Used in video compression, for example, high quality lossy compression works by dropping video colors a person cannot see in the color spectrum. However, as the video file continues to be compressed, this results in a lower bit rate; consequently, the quality of the image will worsen due to the missing pixels. Another example of a format that uses lossy compression is JPEG, one of the most commonly used image formats. When you save an image in JPEG format, the user may choose the quality of the output. The finer the quality, the more space it will consume. Another example of a lossy compression file format is the MP3 format for audio files. When a user saves or exports this type of file, he or she chooses a bit depth for the file, analogous to image quality with JPEG files. The bit depth is the precision of the sound. The higher the bit depth, the more storage space the file will require. Even when choosing a large bit depth, any compression will still result in a loss of magnitude and ampler.
The principal limitation of lossy compression therefore is the tradeoff between size and bandwidth. In the case of video, the tradeoff is the speed in which the video is streamed and the video's quality.
An example of a lossless data compression format is the zip format. In this format, a user choses a file he or she wishes to compress and the file is compressed using compression algorithms. The zip format uses an algorithm rather than a compression key and shrinks the file to a compressed file size.
Current methods for data compression require knowledge of the information contained in a data file in order to remove redundancy in the data file. Compression schemes such as the zip format creates codecs which work on the application layer, not the physical layer (pure binary). For instance, a certain pixel or character (8 bit character, such as the letter “A”) can be given a code in arithmetic coding schemes such that the file can be compressed by replacing codes for certain characters or file parts. This is why there are various compression schemes for differing file types—because current compression systems require knowledge of the file's components in order to determine what will be redundant in the file (and therefore eliminated such that a compressed file can be generated). As a result, these systems can be very complex.
For instance, incoming binary code for an image is translated such that it is seen as an image with characteristics, such as pixels. After this interpretation, current compression schemes then operate on the file or folder. Previous compression systems will work, for example, by removing certain pixels and adding more noise to the image or overall quality of the image. Then, the processed image is assigned a new string of binary code, which is considered the compressed file. Compression of images or videos by these schemes can require complex algorithms and systems for determining which pixels may be redundant and therefore “eliminated” in a compressed file. Or, for example Huffman coding can be used to compress a text file, among other file types. The premise of this coding is that characters which repeat most often (for instance a space) are assigned a much shorter code of bits, so that when translated for transmission, the whole file is much shorter. Basically, the way these systems work is that they require knowledge of what the raw data stands for in order to decide what can be removed from a file for the purpose of compression.
Knowledge of data content for compression is seen in image compression, for example, when data is analyzed for statistical redundancy. For example, an image may have areas of color that do not change over several pixels; instead of coding each individual repeated pixel, the data may be encoded as “X number of red pixels.” Because a pixel is 16 bits, the analysis of the redundant pixels must therefore occur on the application layer, so to speak. Similarly, in the case of video, current compression schemes will see a video file and what it stands for. These compression systems may take into account such things as image quality and video size. Regarding the determination of statistical redundancy above, probability tables are sometimes used for the purpose of analyzing the probability that bit sequences down the string of a binary code may stand for something, for instance a pixel or certain character. There are several examples, but one is the use of probability schemes to determine if there is a high chance a certain upcoming byte (8 bit bit-pattern) will stand for something redundant, for example, the character (a byte) may be removed (or, in the case of an image pixel, a 16 bit section may be removed). Then this analyzed data may then be compressed so that redundant bytes are removed (or in the case of when an image pixel removed, a two byte string is removed) and then a new binary string is assigned to represent the original file. This new binary string is the compressed file which is transmitted. The probability schemes aim to ensure that the compressed file may be decompressed such that it is a close approximation of the original.
In the above example, the statistical apparatus of these systems are based solely on the type of data to be compressed, i.e., video or audio. Current compression systems do not operate on raw binary to compress simply a string of zeroes and ones. Instead, as alluded to above, these systems require an analysis of what the binary represents (the “data”) in order to compress the raw binary behind the data (meaning, operating from the standpoint of the application layer). The JPEG scheme, for example, does not simply take out zeroes and ones from the image's original binary code irrespective of what that bit might stand for.
Because different data types will have different properties in terms of what will be statistically redundant, current compression schemes are different for different data types. Images, text, audio or video will have different properties. For instance, while spaces may be most prevalent in a given text, certain colors might be most prevalent in an image. Therefore, those redundancies are handled differently based on the data type. Also, what is removed from a given original file is removed only after intelligent, and many times complex, analysis of redundancy in the original data.
The background description provided herein is for the purpose of generally presenting the context of the disclosure. To the extent that work of the inventor hereof is described in this background section, as well as aspects of the invention that may not otherwise qualify as prior art at the time of filing, they are neither expressly nor impliedly admitted as prior art against the present disclosure.