Conventional image (and video) compression methods, such as JPEG (MPEG, H.264, involve partitioning an image into square blocks, and processing each block independently using some intra-frame dependences. Videos also use inter-frame dependencies.
JPEG
JPEG is an image compression standard. It applies a discrete cosine transform (DCT) encoder that includes a quantization and entropy encoding within image (macro) blocks, often 6×8 or 16×16 pixels in size.
Due to a high correlation among three color components, the first step is usually to change from a RGB color space to a YCbCr color space. Generally, human visual perception is more sensitive to illumination, and less sensitive to saturation. Therefore, such a color transform helps to reduce the bit rate by keeping more illumination and less saturation data. The transformation into the YCbCr color space reduces the spatial resolution of the Cb and Cr components by down-sampling, and chroma subsampling.
The ratios at which the down-sampling is usually done for PEG images are 4:4:4 (no down-sampling), 4:2:2 (reduction by a factor of two in the horizontal direction), or (most commonly) 4:2:0 (reduction by a factor of 2 in both the horizontal and vertical directions). During the compression process, the Y, Cb and Cr channels are processed separately, and in a very similar manner.
After the color transformation, the image is partitioned into non-overlapping blocks. The color values of pixels are shifted from unsigned integers to signed integers. Then, a 2D DCT is applied.
For an 8-bit image, the intensity of each pixel is in the range [0, 255]. The mid-point of the range is subtracted from each entry to produce a data range that is centered around zero, so that the modified range is [−128, +127]. This step reduces the dynamic range requirements in the DCT processing stage that follows. This step is equivalent to subtracting 1024 from the DC coefficient after performing the transform, which is faster on some architectures because it involves performing only one subtraction rather than 64.
Each 8×8 block of the image is effectively a 64-point discrete signal, which is a function of the two spatial dimensions x and y. The DCT takes such a signal and decomposes it into 64 unique, two-dimensional spatial frequencies, which comprise the input signal spectrum. The ouput of the DCT is the set of 64 basis-signal amplitudes, i.e., the DCT coefficients, whose values are the relative amount of the 2D spatial frequencies contained in the 64-point discrete signal.
The DCT coefficients are partitioned into a DC coefficient and AC coefficients. The DC coefficient corresponds to the coefficient with zero frequency in both spatial dimensions, and the AC coefficients are the remaining coefficients with non-zero frequencies. For most blocks, the DCT coefficients usually concentrate in the lower spatial frequencies. In others words, most of the spatial frequencies have near-zero amplitude, which do not need to be encoded.
To achieve compression, each of the 64 DCT coefficients is uniformly quantized in conjunction with a 64-element quantization table, which is specified by the application. The purpose of quantization is to discard information which is not visually significant. Because quantization is a many-to-one mapping, it is fundamentally a lossy transform. Moreover, it is the principal source of compression in DCT-based encoder. Quantization is defined as division of each DCT coefficient by its corresponding quantization step size, followed by rounding to the nearest integer. Each step size of quantization is ideally selected as the perceptual threshold to compress the image as much as possible without generating any visible artifacts. It is also a function of the image and display characteristics.
There are some processing steps applied to the quantized coefficients. The DC coefficient is treated separately from the 63 AC coefficients. Because there is usually strong correlation between the DC coefficients of adjacent blocks, the quantized DC coefficient is encoded as the difference from the DC term of the previous block in the encoding order, called differential pulse code modulation (DPCM). DPCM can usually achieve further compression due to the smaller range of the coefficient values. The remaining AC coefficients are ordered into a zigzag sequence, which helps to facilitate entropy coding by placing low-frequency coefficients before high-frequency coefficients. Then, the outputs of DPCM and zigzag scanning are encoded by entropy coding methods, such as Huffman coding, and arithmetic coding.
Entropy coding can be considered as a two-step process. The first step converts the zigzag sequence of quantized coefficients into an intermediate sequence of symbols. The second step converts the symbols to a data stream in which the symbols no longer have externally identifiable boundaries. The form and definition of the intermediate symbols is dependent on both the DCT-based mode of operation and the entropy coding method.
In general, JPEG is not suitable for graphs, charts and illustrations especially at low resolutions. The very high compression ratio severely affects the quality of the image, although the overall colors and image form are still recognizable. However, the precision of colors suffer less (for the human eye) than the precision of contours (based on luminance).
Conventional image and video compression schemes mainly aim at optimizing pixel-wise fidelity such as peak signal-to-noise ratio (PSNR) for a given bit-rate. It has been noticed that PSNR is not always a good metric for the visual quality of reconstructed images, while the latter is regarded as the ultimate objective of compression schemes. There are several attempts to design compression methods towards visual quality, in which some image analysis tools such as segmentation and texture modeling are utilized to remove the perceptual redundancy. The basic idea is to remove some image regions by the encoder, and to restore them by the decoder by inpainting, or synthesis methods.
The limitation of block size hinders drastically the existing compression algorithms performance especially when the underlying texture exhibits other spatial structures.