1. Field of the Invention
Embodiments of the present invention generally relate to detection of degraded quality of a video transmission, and, in particular, to a system and method for using face detection to detect degraded video quality.
2. Description of Related Art
Improving and maintaining high video quality during adverse network conditions is important for wide deployments of video over IP networks that inherently lack end-to-end quality of service (“QoS”) guarantees. Application-layer quality assurance is typically achieved by monitoring video quality in real-time, detecting degradation, and taking appropriate action when quality drops. A key step in the process, detection of video quality degradation in real-time, requires light-weight video quality metrics that can be computed with low computational overheads and communicated to the sending side with small transmission overheads.
While some video quality metrics are known in the background art, a standard metric that accurately reflects user opinion with a level of overhead that is appropriate for real-time monitoring and QoS assurance is not known.
Video quality measurement techniques known in the background art fall under three main areas: Full-reference, reduced-reference, and no-reference techniques. In full reference techniques, the original video sequence is compared to the received distorted video sequence using image processing techniques. Hence, full reference techniques require access to both the original transmitted and the received video sequences. The measurements are taken at the media layer and are typically computationally intensive. As a result, these techniques are not suitable for real-time (i.e., in-service) video quality monitoring. Peak Signal to Noise Ratio (“PSNR”) is one of the earliest full-reference metrics. It focuses on the strength of the video signal with respect to noise injected during lossy compression. Among other full-reference techniques are Perceptual Evaluation of Video Quality (“PEVQ”) and Structural Similarity Index.
Reduced reference techniques extract various features from both the original and the distorted video sequences and compare the extracted features of the original and the distorted images to each other. Measurements are taken at the media layer. While the comparison of only the extracted features reduces the computational overhead, it may still be computationally intensive to extract the features from the source video. Additionally, the extracted features of the original sequence need to be sent across the network and synchronized to the received frame for in-service monitoring. As such, the reduced features typically incur notable transmission overheads for real-time operations. Video Quality Metric (“VQM”) is a reduced-reference algorithm developed by the Institute for Telecommunication Sciences (“ITS”). Part of VQM is incorporated into ITU-T J.144. Transmitting the extracted VQM features incur significant overhead for in-service monitoring.
No-reference techniques use only the received distorted image. These techniques can be pixel-based or bitstream-based and are more suitable for both in-service monitoring and off-line network assessment of video quality. Pixel-based techniques involve media layer measurements. Using image processing techniques, the pixel-based techniques look for known distortions in the images to assess quality. However, the pixel-based techniques cannot handle video sequences with unanticipated distortions. In addition, the pixel-based techniques cannot distinguish between impairments due to the network or impairments already in the original video sequence.
Bitstream-based no-reference techniques are computationally lighter since they do not require decoding. Measurements are taken at the bitstream layer. These techniques rely on a Mean Opinion Score (“MOS”) function that maps parameters from the bitstream to video quality. Once the MOS function is known, assessment of video quality is computationally simple since measurements taken at the bitstream layer and the mapping to video quality is light-weight. However, an accurate MOS function that covers all or a majority of possible distortions and conditions must be determined upfront. Furthermore, any such MOS function needs to account for error concealment capabilities of the decoder. Hence bitstream-based techniques are often tied to a specific decoder. VQmon is an example of a video quality metric that inspects the bitstream to monitor application performance in real-time.
Video quality depends at least in part on the error concealment capabilities of a video decoder. A uniform level of packet loss presented to various video decoders may result in varying levels of quality among the video decoders. Hence, video quality metrics based on packet level measurements are specific to the decoder used. Other video quality metrics such as PSNR and VQM, in which measurements are taken at the media layer, are decoder-agnostic, i.e., the metrics are relatively independent of the decoder used. However, PSNR and VQM are not suitable for real-time (i.e., in-service) operations due to computational and transmission overheads.
Therefore, a need exists to provide computationally light-weight video quality evaluation tool, operable over a variety of video decoders, in order to provide detection of video impairments, and ultimately improved customer satisfaction.