Field of Disclosure
This disclosure relates in general to the field of digital video and in particular to the objective measurement of digital video quality and the implementation of the measurement in a real time video codec.
Description of the Related Art
Raw digital video streams often consume immense digital storage space, and are prohibitively massive for transmission. To reduce storage space or transmission bandwidth requirements, a raw video stream is encoded to reduce its size. Typical video encoding involves the process of subtracting a reference frame from an original frame of video to obtain a residual frame containing less information than the original frame. Residual frames are then further processed to obtain a compressed digital video stream for transmission or storage. A typical video decoder receives the compressed digital video stream and decompresses the compressed digital video into the original video stream or a downgraded version of the digital video stream.
In the field of video coding, the real-time estimation of the visual quality of previously encoded video frames in a compressed video stream is an important consideration in encoding subsequent video frames in the same video stream. The visual quality of encoded video is an objective measure of how the encoded video appears to a viewer. Poor video visual quality is characterized by an image display that appears unnatural to human perception. Examples of instances of poor video visual quality include compression artifacts (e.g., blocking, contouring, mosquito noise, and “digitized” video appearance), discoloration, inconsistent contrasting, and inconsistent resolution display. The visual quality of encoded video can be improved by altering the encoding and compression process if the video encoder detects the poor visual quality of previously encoded video frames. Determining the visual quality of encoded video frames in real-time may help an encoder in effectively improving the visual quality of encoded video.
One common method of measuring encoded video frame visual quality is to determine the peak signal-to-noise ratio (psnr) of the frame. psnr is a non-ideal solution, because although psnr can be computed real-time, psnr correlates poorly with human perception. Other measures, such as the structural similarity index metric (ssim), visual information fidelity (vif), and the multi-scale structural similarity index metric (mssim) cannot practicably be computed in real-time due to their computational complexity, limiting their usefulness in distortion control feedback applications of video encoders. In addition, ssim, vif, and mssim can only be computed in the pixel domain, furthering limiting their utility.