1. Field of the Invention
This invention relates generally to multimedia communications systems, and more specifically to video processing techniques for use in conjunction with such systems.
2. Description of the Prior Art
Video composition is a technique which simultaneously processes a plurality of video sequences to form a single video sequence. Each frame of the single video sequence is organized into a plurality of multiple windows. Each of the multiple windows includes frames corresponding to a specific one of the plurality of multiple video sequences. Video composition techniques have broad application to the field of multimedia communications, especially where multipoint communications are involved, as in multipoint, multimedia conferencing systems.
In a multipoint multimedia conference, a "bridge " or "multipoint control unit " (MCU) is often used to establish multipoint connection and multi-party conference calls among a group of endpoints. Generally speaking, the MCU is a computer-controlled device which includes a multiplicity of communication ports which may be selectively interconnected in any of a plurality of configurations to provide communication among a group of endpoint devices. Typical MCUs are equipped to process and route video, audio, and (in some case) data to and from each of the endpoint devices.
MCUs may be categorized as having either a "switched presence " or a "continuous presence ", based upon the video processing capabilities of the MCU. In a "switched presence " MCU, the video signal selected by a specially-designated endpoint device considered to be under the control of a "conference chairman " is broadcast to all endpoint devices participating in the conference. Alternatively, a "switched presence " MCU may select the particular video signal to be sent to all of the endpoint devices participating in the conference by examining the respective levels of audio signals received from each of the endpoint devices. However, note that the "switched presence " MCU includes no video processing capabilities. Rather, the MCU functions in a more limited sense, providing only video switching capabilities. Therefore, at a given moment, each of the endpoint devices participating in a given conference will display a video image from the specially-designated endpoint device used by the "conference chairman " or, alternatively, each of the endpoint devices will display a video image from the endpoint device used by a participant who is currently speaking.
Since the existing MCU is only equipped to switch video signals, and cannot implement functions in addition to switching, each of the endpoint devices are required to use the same video transfer rate in order to be able to communicate with other endpoint devices. The state-of-art MCU is described in ITU Document H.243, "Procedures for Establishing Communication Between Three or More Audiovisual Terminals Using Digital Channels up to 2 Mbps ", March 1993, and in ITU Document H.231, "Multipoint Control Units for Audiovisual systems Using Digital Channels up to 2 Mbps ", March 1993.
In a "continuous presence " MCU, video composition techniques are employed by the MCU. These video composition techniques provide for the selection, processing, and combining of a plurality of video streams, wherein each video stream originates from a corresponding endpoint device. In this manner, video information from multiple conference participants is combined into a single video stream. The combined video stream is then broadcast to all endpoint devices participating in the conference. Such conferences are termed "continuous presence " conferences because each of the conference participants can be simultaneously viewed by all other conference participants. At the present time, study groups organized by the ITU are working on the standardization of "continuous presence " MCUs.
Several techniques have been developed to provide video composition features for "continuous presence " MCUs. The most straightforward technique is termed the transcoding method, which involves the decoding of a plurality of input video bit streams. These bit streams are decoded into the pixel domain, and then the video frames from the plurality of video bit streams are combined in the pixel domain to form an integrated video frame. The integrated video frames are then re-encoded for distribution.
Another technique for providing video composition features has been developed by Bellcore. This technique, which may be referred to as bit stream domain mixing, is useful only in the context of systems conforming to the ITU H.261 standard. Bit stream domain mixing operates on image representations, and exploits a process known as quadrant segmentation. The problem with this approach is that it is not compatible with existing terminal equipment, since it requires asymmetric operation of the endpoint devices. Moreover, since the bit stream mixer in the MCU is passive, the combined bit stream may violate the HRD requirement specified in the H.261 standard.
One state-of-the-art approach to video composition uses specially-equipped video terminals. Each video terminal is equipped to divide the video channel into 2-4 sub channels, while transmitting an outgoing video bit stream on only one of the channels. All of the sub channels use the same bit rate, the same picture format, and the same maximum frame rate. The MCU must provide circuitry for de-multiplexing the sub channels it receives from each terminal, circuitry for routing the sub channels appropriately, and circuitry for re-multiplexing the sub channels prior to transmission to each terminal. Each terminal includes a video receiver which receives up to 4 sub channels for decoding and display. The advantage of this approach is that it provides minimal insertion delay, but this advantage is more than offset by the requirement for elaborate modifications to existing video terminals.