Systems and methods of video encoding are known that employ quantization techniques for reducing the bits from video encoders. Such video encoders frequently employ quantization with varying step-sizes that can have substantial effects on the output bit rates of the video encoders, and on the perceptual quality (also referred to herein as a/the “quality of experience” or “QoE”) of encoded video information, communications, entertainment, and other video content (also referred to herein as an/the “encoded video”). For example, such video encoders can employ larger quantization step-sizes to perform coarser quantization, which can decrease the output bit rates of the video encoders and diminish the QoE for the encoded video. Further, such video encoders can employ smaller quantization step-sizes to perform finer quantization, which can increase the output bit rates of the video encoders and enhance the QoE for the encoded video. Accordingly, in such video encoders, it is generally considered desirable to employ quantization step-sizes that are large enough to constrain the output bit rates of the video encoders within a given bit budget, while at the same time providing the best possible QoE for the encoded video delivered to an end user.
In such video encoders, it is also generally considered desirable to maintain a consistent QoE throughout a sequence of video frames (also referred to herein as a/the “video sequence”), without having the QoE for the video sequence vary widely from video frame to video frame. Because the end user is typically a human in most video applications, prior attempts to maintain a consistent QoE throughout a video sequence have exploited the perceptual insensitivity characteristics of the human visual system (also referred to herein as a/the “HVS”). For example, a conventional approach employed in such video encoders for maintaining a consistent QoE in a video sequence, known as “perceptual quantization,” typically includes performing coarser quantization in areas of video frames where distortion is generally less perceptible to the HVS, and performing finer quantization in areas of the video frames where distortion may be more perceptible to the HVS. Using a conventional perceptual quantization approach, such video encoders typically classify each macroblock (also referred to herein as a/the “MB”) of each video frame in a video sequence as one of a plurality of distortion imperceptibility levels, based on a spatial variation for the MB, (typically represented as the sum of the mean removed absolute difference of pixel values in the MB), a temporal variation for the MB (typically represented as the magnitude of motion vector(s) for the MB), and a brightness of the MB (typically represented as the mean of pixel values in the MB). Having classified each MB of a video frame as one of the plurality of distortion imperceptibility levels, such video encoders typically employ a lower quantization parameter (also referred to herein as a/the “QP”) or a higher QP for the MB, based on the distortion imperceptibility level of the respective MB. For example, such video encoders typically employ a lower QP for MBs having lower spatial and temporal variations and lower brightness, where the HVS can generally perceive distortion more easily. Further, such video encoders typically employ a higher QP for MBs having higher spatial and temporal variations and higher brightness, where the HVS is less likely to perceive such distortion.
Video transcoders have been increasingly employed due to the proliferation and diversity of multimedia applications, video coding standards, networks, displays, etc. In such video transcoders, video streams can be translated or “transcoded” into signal streams compressed according to coding formats supported by the networks that carry the video streams, and/or endpoint devices that receive the video streams, such as traditional personal computers (PCs), mobile PCs, personal digital assistants (PDAs), video-enabled mobile phones, mobile televisions (TVs), third and fourth generation (3G and 4G) phone sets, or any other suitable multimedia systems or devices. Such coding formats include the H.264 coding format (also referred to herein as the “MPEG-4 Advanced Video Coding (AVC) standard”), which has generally improved the efficiency of video encoding, resulting in an increased need for video transcoders that can support the H.264 coding format in real-time video communications applications, video streaming applications, etc.
As in video encoders, it is generally considered desirable in video transcoders to provide the best possible QoE for transcoded video information, communications, entertainment, and other video content (also referred to herein as a/the “transcoded video”) delivered to the end user. However, perceptual quantization approaches typically employed by video encoders for maintaining a consistent QoE in video sequences have drawbacks when employed in conjunction with video transcoders. For example, a video encoder typically receives a relatively high quality video input, whereas a video transcoder receives a video input that has already been encoded by an external video encoder, and may therefore be of lower quality. Further, the characteristics of the external video encoder that encoded the video input may be unknown to the video transcoder. Because the video input of a video transcoder has typically already been encoded by an external video encoder with possibly unknown characteristics, the video transcoder employing a conventional perceptual quantization approach may classify the MBs of video frames from the video input based on inaccurate spatial variation, temporal variation, and/or brightness information for the respective MBs, resulting in the use of QPs that may not provide the best possible QoE for the transcoded video delivered to the end user. For example, such problems with the classification of MBs of video frames within the video transcoder may be more likely to occur if the external video encoder has employed higher QPs for MBs that have higher spatial variation, higher temporal variation, and/or higher brightness. Such use of higher QPs can significantly reduce the spatial variation, and can cause a video transcoder employing a conventional perceptual quantization approach to use a smaller QP when encoding such MBs, resulting in a waste of bits where the HVS is less likely to perceive such distortion.
Moreover, the video input received at a video transcoder, and/or the output of a video decoder within the video transcoder, can carry information that may be useful in the classification of the MBs of video frames from the video input. For example, if an external video encoder from which the video transcoder received the video input employed perceptual quantization in the encoding of the video, then such information may be useful in classifying the MBs of the video frames within the video transcoder. However, a video transcoder employing a conventional perceptual quantization approach may not take into account such information when classifying the MBs. Information about the QPs employed for MBs encoded by the external video encoder using a skip coding mode may also be useful in the classification of the MBs within the video transcoder. However, video frames compressed according to the H.264 coding format do not generally carry such QP information for the MBs of video frames. Further, a video transcoder employing a conventional perceptual quantization approach may not have the capability of estimating the QPs for such MBs of video frames compressed according to the H.264 coding format when the skip coding mode is employed.
It would therefore be desirable to have systems and methods of video transcoding that employ perceptual processing techniques which avoid at least some of the drawbacks of the conventional perceptual quantization approaches described above.