Use of video conferencing equipment is common place, but improvements in the ease of use of such equipment are still desirable.
Background prior art can be found in U.S. Pat. No. 4,560,883, U.S. Pat. No. 7,432,951, and in IEEE Paper: “A media synchronization survey: reference model, specification, and case studies”, Blakowski and Stienmetz, 1996; Report IS-191 of the Advanced Television Systems Committee; and in “Voice and Video Conferencing Fundamentals,” from Cisco Press; as well as EP1,324,608A and WO03/034692.
We will describe techniques which improve on the prior art and facilitate automatically establishing a video conference call or similar digital shared connection.
Lip-Sync
Any system that involves the transmission and reception of both video and audio is potentially vulnerable to the ‘lip-sync’ problem. This is the common name for the situation where users perceive a difference between the time they hear a sound, and the time they see an image associated with that sound. A study by Blakowski and Stienmetz showed that humans are very sensitive to this time difference, and particularly so if audio arrives ahead of video. As a result the Advanced Television Systems Committee recommends in report IS-191 that broadcast television systems should not permit a difference of more than 45 ms if audio leads video, and 75 ms if video leads audio.
Differences in time occur because the audio and video take different paths from sender to receiver. If delays in each stage of each of these paths are fixed, then correction is relatively simple and only involves applying a further fixed delay to the faster path to bring them into synchronisation. An example of this form of correction is provided in many modern home cinema amplifiers which can apply a user controlled fixed delay to correct poor lip sync between video displayed on a television and audio played back through the amplifier.
However in video conferencing systems the problem is more complex because many of the delays are not fixed, but vary over time. A good explanation of the prior art solution to this problem is given in ‘Voice and Video conferencing Fundamentals’ published by Cisco™ press. In summary, prior art video conferencing systems attach a timestamp to each frame of audio samples, and another to each video frame in the transmitter. These timestamps are used in the receiver to calculate the time difference between the two channels and then to correct it on a frame by frame basis.
The prior art approach works well, but it can only be used if it is possible to add timestamps to both audio and video channels. However in our new video conferencing system (see also our co-pending patent application filed on the same day as the application), video is added to a pre-existing ordinary telephone call. This new approach provides a number of benefits over previous approaches. The audio part of the call retains the simplicity, familiarity and reliability of an ordinary phone call. The video is an enhancement, but it does not get in the way of placing or conducting a phone call. Users do not have to replace existing conference phones or use a different microphone for video conferences versus audio conferences. For conference calls, it becomes very easy to mix some participants with audio-only and others who have audio and video. Moreover, if the IP network used by the video channel is heavily congested and video quality is affected, communication can continue uninterrupted as audio-only. Unfortunately using the conventional telephone network for the audio channel of a video conference makes lip-sync much harder to achieve. It is no longer practical to use the prior art technique of adding timestamps to both channels because the POTS (Plain Old Telephone System) telephone network is extremely bandwidth limited and will only reliably carry signals within the range of human hearing. Thus adding time stamps to the audio channel would substantially interfere with the audio portion of the teleconference, which is undesirable. One approach to this problem is described in US2006/0291478.
We will describe techniques suitable for use with embodiments of our new video conferencing system which enable synchronisation of audio and a digital data stream, for example a video data stream, in embodiments without interfering with or modifying the telephone audio in any way.