1. Field of the Invention
This invention relates to video conferencing, in particular relates to audio video synchronization in video conferencing and audio latency reduction.
2. Description of the Related Art
In videoconferencing, video and audio signals from one site are digitally compressed and then sent over conventional communication channels (such as ISDN, IP, etc.) to either a bridge, which will then send the signal on to multiple sites, or to a second site directly. It is a fact of current technology that video is often delayed by significant amounts, from 50 milliseconds to over 2 seconds, in order to perform the necessary processing, which can be quite complex. This video delay may be referred to as video delay. Common practice is to delay the audio to match this video latency in order to maintain synchronization, also called lip-sync. But it is important to note that audio does not have the same inherent delay as video, in part because the algorithms are designed to minimize delay, so this delay is only inserted to synchronize the audio with the video. The inherent delay in audio processing is usually much less. This inherent delay in audio processing is herein referred to as minimum delay.
A problem arises when a video connection exists as part of a two-way (or multi-way) conversation rather than a one-way connection such as a television broadcast or a college lecture. In such cases, one important element is the natural back-and-forth conversational flow. Studies have shown that as little as 200 ms of added round-trip delay can severely degrade the feeling of “being there” that both ends must experience in order to have a normal conversation. Clearly, when 500 to 200 ms are added to synchronize the audio to the slower video, much of the efficiency of the conversation is lost. People will talk over each other, then stop, then talk over each other again. They may deliberately wait, knowing that this delay exists, and so produce a perception of arrogance where none actually exists.
With multimedia bridging now being more common in business conferencing, the need to address audio latency is becoming even more urgent, because these delays are often doubled when going through such a bridge.
There are thus two conflicting requirements in videoconferencing: minimal audio delay, and video-audio synchronization (lip-sync). The traditional approach is to insert additional audio delay in order to achieve lip-sync. As discussed above, these long audio delays cause the resultant conversations to be difficult and stilted. Particularly when talkers are interacting, such as in argument or spirited discussion, low audio latency becomes more important than lip-sync. It is desirable to have a method or system that can reconcile these two conflicting goals in a videoconferencing or make the conflict less discernable.