Data compression is used in communications and computer networking to store, transmit, and reproduce information efficiently. It finds particular application in the encoding of images, audio and video. Common image compression formats include JPEG, TIFF, and PNG. An image compression standard proposed by Google™ is WebP. A popular video coding standard has been the ITU-T H.264/AVC video coding standard. It defines a number of different profiles for different applications, including the Main profile, Baseline profile and others. A newly-developed video coding standard is the ITU-T H.265/HEVC standard. Other standards include VP-8, VP-9, AVS, and AVS-2.
All of these image and video coding standards operate by partitioning and image/picture into blocks (in some cases, a hierarchy of blocks and subblocks, like the coding tree blocks (CTB) of HEVC). A block is predicted and the difference (residual) between the prediction and the actual pixel data of the block is then transformed, quantized and entropy encoded. The quantization of transform domain coefficients introduces distortion that reduces the reconstructed picture quality at the decoder. Many of these processes use some type of rate-distortion optimization routine to select coding parameters based upon trade-offs between transmission rate and distortion.
The human visual system does not have the same sensitivity to all distortion. For example, humans are more sensitive to distortion in lower frequency components than to distortion in higher frequency components. The measure of distortion most commonly used is peak signal-to-noise ratio (PSNR), which measures the mean squared error (MSE) between spatial domain pixels in the reconstructed picture versus the original picture. However, this is not necessarily an accurate representation of human sensitivity to distortion.
Work on human perception of image and video distortion has led to the development of various measurements of “structural similarity” (SSIM) between an original picture and its reconstruction, which may be a better representation of human perception of error than PSNR. A structural similarity metric may take into account the mean values of the two pictures (or a window or block of pixels), the variance within each of those pictures/blocks and the covariance of those two pictures/blocks. SSIM may, therefore, be useful in making coding decisions. However, actual structural similarity metrics are complex to calculate. In U.S. patent application Ser. No. 14/552,590, filed Nov. 25, 2014, a distortion metric entitled weighted mean square error (WMSE) was described for perceptual image and video coding. The contents of U.S. patent application Ser. No. 14/552,590 are hereby incorporated by reference.
The subjective quality of coded images or video may still be improved.
Similar reference numerals may have been used in different figures to denote similar components.