It is known that the Human Vision System (hereinafter, “HVS”) is more sensitive to spatial detail in the luminance domain than in the color domain. Precise measurements of this effect have shown that under normal daytime viewing conditions, the HVS has approximately 2.2 times more spatial acuity for brightness details than for color details. Efficient transmission and processing of images and video (such as seen in JPEG (see ISO/IEC 10918-1:1994) and most video compression systems, such as MPEG and its derivatives (see ISO/IEC 13818, ISO/IEC 14496, and ISO/IEC 23008) leverages this effect by separating the brightness (Y, or luma) channel from the color (CbCr, or chroma) channels and then reducing the resolution of the chroma channels by one octave by decimation before encoding and transmission. This is commonly known as 4:2:0 chroma subsampling. The presence of these decimated chroma channels is signaled from the encoder to the decoder by means of a bitstream flag. Upon reception and decoding, the presence of the bitstream flag indicates to the decoder to upscale the chroma channels, usually by a simple upsampling process such as bicubic interpolation or bilinear interpolation.
Decimation of the chroma channels by more than one octave can significantly reduce the amount of information to transmit, but the decoded and upscaled chroma channels exhibit a variety of artifacts, including reduced saturation of small objects, and color bleed—the perception that a color from a small object is smearing into neighboring background objects. Primarily for this reason, decimation of the chroma channels by more than one octave is not supported in any current image or video compression standard.
There has been some recent work in using more advanced upscaling methods for image-guided scaling, such as taught in Kopf, Johannes, et al. “Joint bilateral upsampling,” ACM Transactions on Graphics (TOG), Vol. 26, No. 3. ACM, 2007, which is incorporated herein by reference, in the form of joint-bilateral sampling to permit restoration of a decimated depth map for 3D images that may be applied to the restoration of decimated chroma channels that have been decimated by more than one octave. However, implementing these methods require close coupling and changes to the encoding and decoding standards for images and video, since the bitstream syntax does not support signaling such channels decimated by more than one octave.
Additionally, simply downscaling the chroma channels by more than one octave and passing the decimated chroma channels co-mingled with the full resolution luma channel confuses the motion estimation of most video encoders, since they rely on the spatial correspondence of the luma and the chroma channels in order to achieve accurate motion estimation, thereby reducing their efficiency. Further, in low-bitrate, high-quantization scenarios for both image encodes and video encoders, co-mingled chroma channels may leave a visible imprint upon the luma channel, causing an artifact recognizable as a shadow of the chroma channel upon the luma channel.