1. Field of the Invention
The present invention relates to video communication and more particularly to a method and an apparatus for mixing bit streams of compressed video from more than one video source.
2. Description of the Prior Art
Video communication between more than two video terminals often requires a Multipoint Control Unit (MCU), a conference controlling entity that typically is a piece of equipment located in a node of a network or in a terminal which receives several channels from access ports and, according to certain criteria, processes audio visual signals and distributes them to a set of connected channels. Examples of MCUs include the MGC-100, which is available from Polycom Networks Systems Group. A terminal (which may be referred to as an endpoint) is an entity on the network, capable of providing real-time, two-way audio and/or visual communication with other terminals or the MCU.
The MCU may include a bank of decoders, encoders, and bridges. The MCU may use a large amount of processing power to handle video communications between a variable number of participants, using a variety of communication and compression standards and a variety of bit streams, for example. The MCU may need to compose these bit streams into at least one single output stream that is compatible with the requirements of at least one conference participant to which the output stream is being sent.
A conference may have one or more video output streams. Each output stream is associated with a layout. A layout defines the appearance of a conference on a screen (display) of conferees that receive the stream. A layout may be divided into one or more segments. Each segment may be associated with the video that is sent by a certain conferee. The association between the segment and the conferee may be dynamically changed during a conference.
Each output stream may be constructed of several input streams. Such a conference may be called “continuous presence” (CP). In a CP conference a user at a remote terminal can observe, simultaneously, several other participants in the conference. Each participant may be displayed in a segment of the layout. The segments may be in the same size or may be in different sizes. The choice of the participants that are associated with the segments of the layout may be varied among different conferees. In this situation, the amount of bits allocated to each segment can also vary and may depend on the video activity in the segment, on the size of the segment, or some other criteria.
Following are few examples of conference layout. A layout that a current speaker receives may include (in the segment that is associated with the speaker) video of the previous speaker instead of the video of the current speaker (i.e., himself), while the other conferees receive the video of the current speaker. In some conferences two or more conferees may have different layouts. Therefore a video stream that arrives from a certain conferee may be displayed in different segments (location and/or sizes) in the layouts that are sent to different conferees.
Thus, an MCU may need to decode each input stream into uncompressed video of a full frame; manage the plurality of uncompressed video streams that are associated with the conferences; and manage a plurality of output streams, in which each output stream may be associated with a conferee or a certain layout. The output stream may be generated by a video port. A video port may have a layout builder and an encoder. The layout builder may scale the different uncompressed video frames into their final size and place them into their segment in the layout. Then, the video of the composed video frame is encoded by the encoder.
Consequently processing and managing a plurality of videoconferences require heavy and expensive computational resources. Therefore an MCU is typically an expensive and rather complex product. Common MCUs are disclosed in several patents and patent applications, for example, U.S. Pat. Nos. 6,300,973, 6,496,216, 5,600,646, or 5,838,664, the contents of which are incorporated herein by reference. Those patents disclose the operation of a video unit in an MCU that may be used to generate the video for a CP conference.
In more recent years, videoconferencing and other forms of multimedia communications have become more commonplace. The advent of personal computers having videoconferencing capabilities creates a demand for MCUs having the capability of multimedia communication between devices. This trend raises the need for low cost MCUs, such as Software MCUs, which use a software program to compose compressed video streams into a compressed video of a CP conference without actually decoding and encoding the streams. However, low cost MCUs may only handle a limited multipoint communication (e.g. a limited number of compression standards, a limited number of conferees, and a limited number of layouts).
For example, U.S. Pat. No. 5,675,393, which is incorporated herein by reference, discloses an image processing apparatus for composing a plurality of Quarter Common Intermediate Format (QCIF) coded images into one CIF image without decoding the plurality of coded images when the images are transmitted using the H.261 standard. QCIF is a videoconferencing format that specifies a video frame containing 144 lines and 176 pixels per line, which is one-fourth the resolution of Common Intermediate Format (CIF). QCIF support is required by some of the International Telecommunications Union (ITU) videoconferencing standards.
U.S. patent application Ser. No. 09/768,219, published as U.S. Pub. No. 2001/0019354A1 and entitled “Method and an Apparatus for Video Mixing of Bit Streams,” and which is incorporated herein by reference, discloses a method and apparatus for mixing as many as four QCIF H.263 compressed video bit streams into a composite CIF image.
Moreover, U.S. patent application Ser. No. 10/310,728, entitled “Method and an Apparatus for Mixing Compressed Video,” which is incorporated herein by reference, discloses a method and apparatus for mixing QCIF H.263, Annex K compressed video bit streams into a composite CIF image or 4CIF image.
However, those methods and apparatus offer limited functionalities. For example, the segment size of each one of the conferees in the layout is the same size as his input stream. In case of mixing QCIF images into a CIF, the layout of the output frame is limited to up to four conferees and the frame portion that is associated with each one of the up to four conferees is a quarter of the output frame.
Furthermore, those methods require that compression of input streams and output streams are accomplished using the same compression algorithm. Therefore, there is a need for a method and apparatus that can offer flexible layouts, can display flexible number of conferees simultaneously, and can handle different input and output video compression algorithms and/or the different bit rates with reducing the cost of an MCU.