As traffic over Internet Protocol (IP) networks continues its rapid growth, as well as the growth of the variety of video conferencing equipment to be used over IP networks, more and more people use video conferencing as their communication tool. A common multipoint conference between three or more participants requires a Multipoint Control Unit (MCU). An MCU is a conference controlling entity that is typically located in a node of a network or in a terminal which receives several channels from endpoints. According to certain criteria, the MCU processes audio and visual signals and distributes them to a set of connected channels. Examples of MCUs include the MGC-100, RMX 2000, which are available from Polycom, Inc. A terminal (which may be referred to as an endpoint) is an entity on the network, capable of providing real-time, two-way audio and/or audio visual communication with other terminals or with the MCU. A more thorough definition of an endpoint and an MCU can be found in the International Telecommunication Union (“ITU”) standards, such as but not limited to the H.320, H.324, and H.323 standards, which can be found at the ITU website: www.itu.int.
A common MCU may include a plurality of audio and video decoders, encoders, and bridges. The MCU may use a large amount of processing power to handle audio and video communications between a variable number of participants (endpoints). The communication can be based on a variety of communication protocols and compression standards and may be received from different endpoints. The MCU may need to compose a plurality of input audio or video streams into at least one single output stream of audio or video (respectively) that is compatible with the properties of at least one conferee (endpoint) to which the output stream is being sent. The compressed audio streams are decoded and can be analyzed to determine which audio streams will be selected for mixing into the single audio stream of the conference.
A conference may have one or more video output streams where each output stream is associated with a layout. A layout defines the appearance of a conference on a display of one or more conferees that receive the stream. A layout may be divided into one or more segments where each segment may be associated with a video input stream that is sent by a certain conferee (endpoint). Each output stream may be constructed of several input streams, resulting in a continuous presence (CP) conference. In a CP conference, a user at a remote terminal can observe, simultaneously, several other participants in the conference. Each participant may be displayed in a segment of the layout, where each segment may be the same size or a different size. The choice of the participants displayed and associated with the segments of the layout may vary among different conferees that participate in the same session.
A common MCU may need to decode each input video stream into uncompressed video of a full frame; manage the plurality of uncompressed video streams that are associated with the conferences; and compose and\or manage a plurality of output streams in which each output stream may be associated with a conferee or a certain layout. The output stream may be generated by a video output port associated with the MCU. An exemplary video output port may comprise a layout builder and an encoder. The layout builder may collect and scale the different uncompressed video frames from selected conferees into their final size and place them into their segment in the layout. Thereafter, the video of the composed video frame is encoded by the encoder and sent to the appropriate endpoints. Consequently, processing and managing a plurality of videoconferences require heavy and expensive computational resources and therefore an MCU is typically an expensive and rather complex product. Common MCUs are disclosed in several patents and patent applications, for example, U.S. Pat. Nos. 6,300,973, 6,496,216, 5,600,646, or 5,838,664, the contents of which are incorporated herein by reference. These patents disclose the operation of a video unit in an MCU that may be used to generate the video output stream for a CP conference.
The growing trend of using video conferencing raises the need for low cost MCUs that will enable one to conduct a plurality of conferencing sessions having composed CP video images.
There are existing techniques for composing compressed video streams into a CP video image with fewer resources than a common MCU. Some techniques disclose the use of an image processing apparatus for composing a plurality of Quarter Common Intermediate Format (QCIF) coded images into one CIF image. These techniques do not require the decoding of a plurality of coded images when the images are transmitted using the H.261 standard. QCIF is a videoconferencing format that specifies a video frame containing 144 lines and 176 pixels per line, which is one-fourth of the resolution of Common Intermediate Format (CIF). QCIF support is required by some of the International Telecommunications Union (ITU) videoconferencing standards.
Other techniques to overcome the QCIF limitation, of size and layouts, use a sub-encoding method. An exemplary sub-encoding method is disclosed in U.S. Pat. No. 7,139,015, the content of which is incorporated herein by reference.
Therefore, there is a need for a cost efficient method and apparatus to implement a plurality of various layouts with a large number of conferees in a plurality of video conference sessions.