In certain video teleconferencing environments, each of a plurality of individuals has a camera, a microphone, and a display, the combination of is referred to herein as a teleconference endpoint. The video and audio from each endpoint is streamed to a central location where a video processing device, e.g., a Multi-point Control Unit (MCU), takes the video (and audio) from the various endpoints and redistributes the video to other endpoints involved in a conference session.
In some forms, the MCU acts as a video compositor and reformats the video by combining several video images onto a single screen image, thereby forming a “composite” image. The combination of various video feeds onto a single screen requires the reception of one whole frame from each video source in order to create the output frame. When the sources are asynchronous, each source uses a frame buffer. The average latency of these frame buffers is one-half a frame, or 16 milliseconds (ms) at a standard frame rate of 30 frames per second (fps), while the maximum latency is a full frame or approximately 33 ms. Latency may cause undesirable video and audio effects for those participating in the video teleconference.