As the traffic over Internet Protocol (IP) networks continues its rapid growth, and as the variety of video conferencing equipment used over IP networks continues to increase, more and more people are using video conferencing over IP networks as their communication tool of choice. A common multipoint conference between three or more participants uses a multipoint control unit (MCU). An MCU is a conference-controlling entity that typically is located in a node of a network or in a terminal which receives several channels from endpoints and, according to certain criteria, processes audiovisual signals and distributes them to a set of connected channels. Examples of MCUs include the MGC-100 and the RMX 2000, which are available from Polycom Inc. A terminal (which may also be referred to as an endpoint) is an entity on the network, capable of providing real-time, two-way audio and/or visual communication with other terminals or with the MCU. More thorough definitions of terminal and MCU can be found in the International Telecommunication Union (“ITU”) standards, for example, the H.320, H.324, and H.323 standards.
A common MCU may include a plurality of decoders, encoders, and bridges. The MCU may use a large amount of processing power to handle video communications between a variable number of participants, using a variety of communication and compression standards for the variety of input bit streams received from the different endpoints. The MCU may need to compose these input streams into at least one single output stream that is compatible with the requirements of at least one conferee to which the output stream is being sent.
A conference may have one or more video output streams. Each output stream is associated with a layout. A layout defines the appearance of a conference on a screen (display) of one or more conferees that receive the stream. A layout may be divided into one or more segments. Each segment may be associated with the video input stream that is sent by a certain conferee. Each output stream may be constructed of several input streams. Such a conference may be called a “continuous presence,” or CP, conference. In a CP conference, a user at a remote terminal can simultaneously observe several other participants in the conference. Each participant may be displayed in a different segment of the layout. The segments may be of the same size or of different sizes. The choice of the participants that are associated with the segments of the layout may be varied among different conferees.
A common MCU may: decode each input stream into uncompressed video of a full frame; manage the plurality of uncompressed video streams that are associated with the conferences; and compose and/or manage a plurality of output streams, in which each output stream may be associated with a conferee or a certain layout. The output stream may be generated by a video output port associated with the MCU. An exemplary video output port may comprise a layout builder and an encoder. The layout builder may collect and scale the different uncompressed video frames from selected conferees into their final size and place them into their segment in the layout. Then, the video of the composed video frame is encoded by the encoder. Consequently processing and managing a plurality of videoconferences requires heavy and expensive computational resources. Therefore, an MCU is typically an expensive and rather complex product. Common MCUs are disclosed in several patents and patent applications, for example, U.S. Pat. Nos. 6,300,973, 6,496,216, 5,600,646, or 5,838,664, the contents of which are incorporated herein by reference. These patents disclose the operation of a video unit in an MCU that may be used to generate the video output stream for a CP conference.
The growing trend of using video conferencing over IP networks raises the need for low cost MCUs that will be able to conduct a plurality of conferencing sessions as well as compose CP video images. However, low cost MCUs may only be able to handle a limited number of multipoint conferences (e.g., a limited number of conferees, a limited number of layouts, a limited number of communication standards, etc.).
There are existing techniques that compose compressed video streams into a compressed video stream of CP video images with fewer resources than a common MCU. Some of them disclose an image processing apparatus for composing a plurality of Quarter Common Intermediate Format (QCIF) coded images into one CIF image without decoding the plurality of coded images when the images are transmitted using the H.261 standard. QCIF is a videoconferencing format that specifies a video frame containing 144 rows and 176 pixels per row, which is one-fourth of the resolution of Common Intermediate Format (CIF). QCIF support is required by some of the International Telecommunications Union (ITU) videoconferencing standards. However, such prior art methods can not be implemented in sessions which use modern compression standards such as H.264.
Other techniques to overcome the size and layout limitations listed above use what is known as a sub-encoding method. An exemplary sub-encoding method is disclosed in U.S. Pat. No. 7,139,015, the contents of which are incorporated herein by reference. However, sub-encoding systems require the use of resources such as video decoders and encoders.
Thus, existing methods and apparatuses offer limited functionalities. For example, the segment size of each one of the conferees in the layout is the same size as his input stream. In the case of mixing QCIF images into a CIF image, the layout of the output frame is limited to a maximum of four conferees, and the frame portion that is associated with each one of the conferees is a quarter of the output frame.
Therefore, there is a need for a method and apparatus that can offer a wide variety of layouts and the ability to support a large number of conferees. In order to comply with the increasing use video conferencing over IP networks, the new method and apparatus will need to be able to manipulate video compression standards that are popular in IP video communication.