The disclosed embodiments of the present invention relate to video coding, and more particularly, to a video coding method using at least evaluated visual quality determined by one or more visual quality metrics and a related video coding apparatus.
The conventional video coding standards generally adopt a block based (or coding unit based) coding technique to exploit spatial redundancy. For example, the basic approach is to divide the whole source frame into a plurality of blocks (coding units), perform prediction on each block (coding unit), transform residues of each block (coding unit) using discrete cosine transform, and perform quantization and entropy encoding. Besides, a reconstructed frame is generated in a coding loop to provide reference pixel data used for coding following blocks (coding units). For certain video coding standards, in-loop filter(s) may be used for enhancing the image quality of the reconstructed frame. For example, a de-blocking filter is included in an H.264 coding loop, and a de-blocking filter and a sample adaptive offset (SAO) filter are included in an HEVC (High Efficiency Video Coding) coding loop.
Generally speaking, the coding loop is composed of a plurality of processing stages, including transform, quantization, intra/inter prediction, etc. Based on the conventional video coding standards, one processing stage selects a video coding mode based on pixel-based distortion value derived from a source frame (i.e., an input frame to be encoded) and a reference frame (i.e., a reconstructed frame generated during the coding procedure). For example, the pixel-based distortion value may be a sum of absolute differences (SAD), a sum of transformed differences (SATD), or a sum of square differences (SSD). However, the pixel-based distortion value merely considers pixel value differences between pixels of the source frame and the reference frame, and sometimes is not correlated to the actual visual quality of a reconstructed frame generated from decoding an encoded frame. Specifically, based on experimental results, different processed images, each derived from an original image and having the same pixel-based distortion (e.g., the same mean square error (MSE)) with respect to the original image, may present different visual quality to a viewer. That is, the smaller pixel-based distortion does not mean better visual quality in the human visual system. Hence, an encoded frame generated based on video coding modes each selected due to a smallest pixel-based distortion value does not guarantee that a reconstructed frame generated from decoding the encoded frame would have the best visual quality.