1. Field of the Invention
The present invention relates to video coding. In particular, the present invention relates to a system and a method for encoding 4:2:0 Red-Green-Blue (RGB) video data.
2. Description of the Related Art
Residual Color Transform (RCT) is a coding tool for the H.264 High 4:4:4 profile that is intended to be used for efficient coding of Red-Green-Blue-format (RGB-format) video sequences. FIGS. 1 and 2 illustrate the difference between a conventional video coding system that does not use the RCT coding tool and a conventional video coding system that uses the RCT coding tool. Details regarding the encoding and decoding, and the prediction and compensation loops are not shown in either of FIGS. 1 or 2.
FIG. 1, in particular, depicts a high-level functional block diagram of a conventional video coding system 100 that does not use the RCT coding tool. Conventional video coding system 100 captures Red-Green-Blue (RGB) data in a well-known manner at 101. At 102, the RGB data is converted into a YCbCr (or YCoCg) format. At 103, intra/inter prediction is performed on the YCbCr-formatted (or YCoCg-formatted) data. A spatial transform is performed at 104 and quantization is performed at 105. Entropy encoding is performed at 106. The encoded data is transmitted and/or stored, as depicted by channel/storage 107. At 108, the encoded data is entropy decoded. At 109, the entropy-decoded data is dequantized. An inverse-spatial transform is performed at 110, and intra/inter compensation is performed at 111. At 112, the resulting YCbCr-formatted (or YCoCg-formatted) data is transformed to RGB-based data and displayed at 113.
FIG. 2 depicts a high-level functional block diagram of a conventional video coding system 200 that uses the RCT coding tool for the H.264 High 4:4:4 profile. Video coding system 200 captures RGB data in a well-known manner at 201. At 202, intra/inter prediction is performed on the RGB data. At 203, the intra/inter-predicted data is converted into a YCbCr (or YCoCg) format. A spatial transform is performed at 204 and quantization is performed at 205. Entropy encoding is performed at 206. The encoded data is transmitted and/or stored, as depicted by channel/storage 207. At 208, the encoded data is entropy decoded. At 209, the entropy-decoded data is dequantized. An inverse-spatial transform is performed on YCbCr-formatted (or YCoCg-formatted) data at 210. At 211, the YCbCr-based data is transformed to RGB-based data. At 212, intra/inter compensation is performed, and RGB-based data is displayed at 213.
The difference between conventional video coding system 100 (FIG. 1) and conventional video coding system 200 (FIG. 2) is that the RCT coding tool of system 200 enables compression and decompression directly to and from the RGB space. To illustrate this, compression directly in the RGB space is depicted in FIG. 2 by the sequence of functional blocks 202-204. In particular, intra/inter prediction is performed on the RGB data at 202. The intra/inter-predicted data is converted into a YCbCr (or YCoCg) format at 203. A spatial transform is performed at 204. Decompression directly from the RGB space is depicted in FIG. 2 by the sequence of functional blocks 210-212. At 210, an inverse-spatial transform is performed on YCbCr-formatted (or YCoCg-formatted) data at 210. At 211, the YCbCr-formatted data is transformed to RGB-based data. At 212, intra/inter compensation is performed at 111.
In contrast, the corresponding compression process in conventional video coding system 100 is depicted by functional blocks 102-104. At 102, RGB data is converted into a YCbCr (or YCoCg) format. Intra/inter prediction is performed on the YCbCr-formatted (or YCoCg-formatted) data at 103, and a spatial transform is performed at 104. The corresponding decompression process is depicted by functional blocks 110-112 in which an inverse spatial transform is performed at 110. Intra/inter compensation is performed at 111. Lastly, the YCbCr (or YCoCg) data is transformed to RGB-based data at 112.
The color conversion in RCT at 203 in FIG. 2, as an RGB-based format to a YCoCg-based format, is inside the typical coding loop, and can be considered as an extension of a conventional transform coding from a 2D spatial transform to a 3D transform (2D spatial+1D color), but, with the same purpose of all transform coding, that is, data decorrelation and energy compaction and, consequently, easier compression. Significant improvements in rate distortion performance over a conventional coding scheme have been achieved for all three RGB color components, as demonstrated in an updated version of the RCT algorithm.
The main challenge for RCT, as RCT is applied in practice, does not related to compression, but relates to video capture and display. Moreover, most video-capture and video-display devices currently do not support the RGB format not because extra hardware and software resources are needed internally to convert data from RGB-based data but based on the bandwidth requirements for the 4:4:4 RGB format.
For a single-chip-color-sensor digital video camera, each pixel actually has only one color component. FIG. 3 depicts a typical Bayer mosaic filter, which is used for most of the popular primary-color-mosaic sensors. As depicted in FIG. 3, the Green (G) sensors, or filters, cover 50% of the pixels, while the Blue (B) and Red (R) sensors, or filters, each cover 25% of the pixels. Because each pixel position has only one color component, the other two color components must be interpolated, or generated, based on existing samples. The interpolation process is a simple 1:3 data expansion.
FIG. 4 depicts a high-level functional block diagram of a conventional video coding system 400 that provides a lossy color conversion, such as from RGB to YCbCr. At 401, RGB sensors perform RGB capture. At 402, a 1:3 interpolation process is performed for generating missing color components. At 403, color conversion and a lossy 2:1 subsampling is performed for generating 4:2:0 YCbCr data. Thus, the overall process up to functional block 404 results in a lossy 1:1.5 data expansion before the 4:2:0 YCbCr data is compressed, i.e., video coded. The encoded data is transmitted and/or stored, as depicted by channel/storage 405. The video coded data is then decoded at 406. Color up-sampling and color conversion occurs at 407, and the resulting data is RGB displayed at 408.
Consequently, what is needed is a residual color transformation (RCT) coding tool for a 4:2:0 RGB format in which compression is performed directly on 4:2:0 RGB without data loss prior to compression.