Data compression occurs in a number of contexts. It is very commonly used in communications and computer networking to store, transmit, and reproduce information efficiently. It finds particular application in the encoding of images, audio and video. Video presents a significant challenge to data compression because of the large amount of data required for each video frame and the speed with which encoding and decoding often needs to occur. The current state-of-the-art for video encoding is the ITU-T H.265/HEVC video coding standard. It defines a number of different profiles for different applications, including the Main profile, Main Still Picture profile and others.
There are a number of standards for encoding/decoding images and videos, including H.265, that use block-based coding processes. In these processes, the image or frame is divided into blocks, with sizes typically ranging from 4×4 to 64×64, although non-square blocks may be used in some cases, and the blocks are spectrally transformed into coefficients, quantized, and entropy encoded. In many cases, the data being transformed is not the actual pixel data, but is residual data following a prediction operation. Predictions can be intra-frame, i.e. block-to-block within the frame/image, or inter-frame, i.e. between frames (also called motion prediction).
When spectrally transforming residual data, many of these standards prescribe the use of a discrete cosine transform (DCT) or some variant thereon. The resulting DCT coefficients are then quantized using a quantizer to produce quantized transform domain coefficients. The blocks of quantized coefficients are then entropy encoded and packaged with side information, like motion vectors and other data, to produce a bitstream of encoded video.
At the decoder, the bitstream is entropy decoded to reconstruct the quantized coefficients. The decoder then inverse quantizes and inverse transforms the reconstructed quantized coefficients to reconstruct the pixel domain residual. Using the same prediction operation as was used at the encoder, the pixel data is then reconstructed.
Pixel data is generally separated into luma and chroma components and each is encoded using a similar process (usually two chroma components). Because of human perception limitations regarding spatial location of colour data, chroma is typically subsampled such that for each chroma component, a single chroma sample is only sent for every two or four luma samples. This has worked well for natural scene video, but has posed problems for computer-generated content, such as text and graphics, which require sharp colour edge definition to avoid blurriness and other artefacts.
The unsuitability of chroma subsampling to some types of content becomes more problematic in the case of mixed content containing both natural scenes and computer-generated graphics.
Similar reference numerals may have been used in different figures to denote similar components.