1. Field of the Invention
Embodiments of the present invention generally relate to detection of degraded quality of a video transmission, and, in particular, to a system and method for using face detection to detect and correct for excessive end-to-end video frame delays in order to improve video quality.
2. Description of Related Art
Often during a live interview on TV, an interviewer in a studio may be talking with an interviewee at a remote location. There may be an appreciable delay for the video and audio signal going back and forth. This tends to create ambiguous verbal cues as to whether one person has stopped talking and is expecting the other person to start talking, and so forth. As a result, the interview and interviewee may begin talking over one another, and then they both stop and wait for the other person to continue talking, and so forth. This scenario is one manifestation of excessive end-to-end video frame delays. In another manifestation, there may be a relatively high differential delay between the audio and video portions of an interview, such that there is a noticeable and annoying timing mismatch between spoken words that are heard and video of a speaker speaking the spoken words.
Improving and maintaining high video quality during adverse network conditions is important for wide deployments of video over IP networks that inherently lack end-to-end quality of service (“QoS”) guarantees. Application-layer quality assurance is typically enhanced by monitoring video frame delay in real-time, detecting degradation, and taking appropriate action when the video frame delay increases unacceptably. A key step in the process, detection of high video frame delay in real-time, requires light-weight video metrics that can be computed with low computational overheads and communicated to the sending side with small transmission overheads.
End-to-end frame delay is an important metric impacting video Quality of Experience (“QoE”). End-to-end frame delay is defined as the difference between the time of capture of a frame at the source and the time the frame is displayed at the destination. High frame delays can render a video conversation useless and can contribute to lip-synching problems. As the audio and video streams in video conferencing and video phones typically take different paths, understanding the end-to-end frame delay is important in QoE monitoring and potentially debugging.
When a video system is operational, frame delays can be computed by inserting a watermark in parts of the image not visible to the user. Watermarking involves embedding timing information into video streams images such that the embedded timing information can be used to identify matching frames between the sent and received streams. Frames with the same watermark values on the sent and received sides of a video stream are determined and their timing information are compared to compute the end-to-end frame delay. The clocks of the machines computing the metric need to be synchronized. A disadvantage is that watermarks may become distorted or obliterated during transcoding operations.
Frame delay may be computed by synchronizing sent and received frames, and then using the timing information of synchronized frames to compute the frame delay. Frame synchronization typically relies on image processing based techniques and is time consuming especially in the presence of frame losses, transcoding, changes in frame rates and resolution. Hence computing frame delay by relying on frame synchronization is not suitable for real time operations.
End-to-end frame delay measurements, while important for QoE, are typically not measured or reported to the users during a video conference or a video call. Therefore, a need exists to provide a process to measure delays between a sent and received video stream, in order to provide end-to-end frame delay measurements, and ultimately improved customer satisfaction.