Video conferences consisting of three or more participants generally use a multipoint control unit (MCU) to mix the audio streams and the video streams. The MCU is also referred to as a conference bridge and typically consists of a multipoint controller and one or more multipoint processors, which may be located on different network devices. The multipoint controller handles the call connection process in order to connect streams from endpoints to the multipoint processors. The multipoint processors perform the actual audio and video mixing. Each multipoint processor typically includes an audio mixer and a video mixer.
Each endpoint will typically send both its audio stream and its video stream to the same multipoint processor. The multipoint processor will typically send one audio stream and one video stream back to each endpoint. The audio mixer and the video mixer use the same time base when generating timestamps for the mixed audio and mixed video streams so that each endpoint can achieve lip synchronization between the mixed audio and video streams. In a conventional multipoint processor, the audio and video mixers run on processes within the same multipoint processor on a single network device and use a common time base provided by the network device. However, there is no mechanism to provide lip synchronization when the audio and video mixers are located on separate devices and/or geographically apart while operating from different time bases.