In a video conference environment for performing an electronic conference through a communication network, a video conference system is provided in each of locations in which participants of a conference come together and a plurality of such video conference systems communicate via the communication network. Each video conference system collects image information and audio information in a location in which the conference system is provided. The image information and the audio information are synthesized and the synthesized information is distributed to the respective conference systems. In each conference system, the image information is displayed on a display device provided in the video conference system and the audio information is outputted through a loudspeaker also provided in the video conference system.
In order for the video conference to be conducted successfully, the audio information must be calibrated with the video information so that the lip movement of each participant is synchronized with the associated audio feed. A variety of calibration mechanisms exist for calibrating the audio and video components in the video conference environment, however these mechanisms are expensive and tedious.