As traffic over Internet Protocol (IP) networks continues its rapid growth, with the growth of the variety of video conferencing equipment, more and more people use video conferencing as their communication tool. A multipoint conference between three or more participants requires a Multipoint Control Unit (MCU). An MCU is a conference controlling entity that is typically located in a node of a network or in a terminal which receives several channels from endpoints. According to certain criteria, the MCU processes audio and visual signals and distributes them to a set of connected channels. Examples of MCUs include the MGC-100, RMX® 2000, both of which are available from Polycom, Inc. (RMX 2000 is a registered trademark of Polycom, Inc.). A terminal (which may be referred to as an endpoint) is an entity on the network, capable of providing real-time, two-way audio and/or audio visual communication with other terminals or with the MCU. A more thorough definition of an endpoint and an MCU can be found in the International Telecommunication Union (“ITU”) standards, such as the H.320, H.324, and H.323 standards, which can be found at the ITU website: www.itu.int.
An MCU may include a plurality of audio and video decoders, encoders, and bridges. The MCU may use a large amount of processing power to handle audio and video communications between a variable number of participants (endpoints). The communication can be based on a variety of communication protocols and compression standards and may be received from different endpoints. The MCU may need to compose a plurality of input audio or video streams into at least one single output stream of audio or video (respectively) that is compatible with the properties of at least one conferee (endpoint) to which the output stream is being sent. The compressed audio streams received from the endpoints are decoded and can be analyzed to determine which audio streams will be selected for mixing into the single audio stream of the conference. For purposes of the present disclosure, the terms decode and decompress can be used interchangeably.
A conference may have one or more video output streams where each output stream is associated with a layout. A layout defines the appearance of a conference on a display of one or more conferees that receives the stream. A layout may be divided into one or more segments where each segment may be associated with a video input stream that is sent by a conferee (endpoint). Each output stream may be constructed of several input streams, resulting in a continuous presence (CP) conference. In a CP conference, a user at a remote terminal can observe, simultaneously, several other participants in the conference. Each participant may be displayed in a segment of the layout, where each segment may be the same size or a different size. The choice of the participants displayed and associated with the segments of the layout may vary among different conferees that participate in the same session.
An MCU may need to decode each input video stream into uncompressed video of a full frame, manage the plurality of uncompressed video streams that are associated with the conferences, and compose and/or manage a plurality of output streams in which each output view stream may be associated with a conferee or a certain layout. The output stream may be generated by a video output port of the MCU. A video output port may comprise a layout builder and an encoder. The layout builder may collect and scale the different uncompressed video frames from selected conferees into their final size and place them into their segment in the layout. Thereafter, the video of the composed video frame is encoded by the encoder and sent to the appropriate endpoints. Consequently, processing and managing a plurality of videoconferences require heavy and expensive computational resources and therefore an MCU is typically an expensive and rather complex product. MCUs are disclosed in several patents and patent applications, for example, U.S. Pat. Nos. 6,300,973, 6,496,216, 5,600,646, or 5,838,664, the contents of which are incorporated herein by reference. These patents disclose the operation of a video unit in an MCU that may be used to generate the video output stream for a CP conference.
The growing trend of using video conferencing raises the need for low cost MCUs that will enable conducting a plurality of conferencing sessions having composed CP video images.
There are existing techniques for composing compressed video streams into a CP video image with fewer resources than a conventional MCU. Some techniques disclose the use of an image processing apparatus for composing a plurality of Quarter Common Intermediate Format (QCIF) coded images into one CIF image. These techniques do not require the decoding of a plurality of coded images when the images are compressed using the H.261 standard. QCIF is a videoconferencing format that specifies a video frame containing 144 lines and 176 pixels per line, which is one-fourth of the resolution of Common Intermediate Format (CIF). QCIF support is required by some of the International Telecommunications Union (ITU) videoconferencing standards.
Other techniques to overcome the QCIF limitation of size and layouts use a sub-encoding method. One such sub-encoding method is disclosed in U.S. Pat. No. 7,139,015, which is incorporated herein by reference in its entirety for all purposes.
Other video conferencing systems use Media Relay Conferencing (MRC). In MRC a Media Relay MCU (MRM) receives one or more streams from each participating Media Relay Endpoint (MRE), which may be referred to herein as relay RTP compressed video streams or relay streams. The MRM relays to each participating endpoint a set of multiple video streams received from other endpoints in the conference, which may be referred to herein as relayed RTP compressed video streams or relayed streams. Each receiving endpoint uses the multiple streams to generate the CP video image according to a layout. The CP video image is presented to the MRE's user. An MRE can be a terminal of a conferee in the session which has the ability to receive relayed media from an MRM and deliver compressed media according to instructions from an MRM. MRMs are described in more detail in U.S. Patent Publication No. 2010/0194847, which is incorporated herein by reference in its entirety for all purposes. For purposes of the present disclosure, the terms endpoint and MRE may be used interchangeably.
In some MRC systems, a transmitting MRE sends its video image in two or more streams, each stream associated with different quality level. Such a system can use the plurality of streams to provide different window sizes in the layouts, different resolutions used by each receiving endpoint, etc. Furthermore, the plurality of streams can be used for overcoming packet loss. The qualities may differ in frame rate, resolution and/or signal to noise ratio (SNR), etc.
Video streaming is becoming more and more popular. Furthermore, more and more sources of video streaming as well as video conferencing system deliver a plurality of streams in parallel, where the streams differ from each other by the quality of the compressed video. The quality can be expressed in number of domains, such as time domain (frames per second, for example), spatial domain (high definition (HD) or CIF, for example), and/or in quality (sharpness, for example). Video compression standards that are used for video streaming and multi-quality streams include H.264 AVC, H.264 annex G, MPEG-4, etc. More information on compression standards such as H.264 can be found at the ITU website www.itu.int, or at www.mpeg.org.
From time to time, during a conference session, a receiving MRE needs an Intra frame from one of the transmitting MREs. The Intra frame can be requested due to missing packets, changes in the layout presented in the receiving MRE, a participant joining an ongoing videoconferencing session, etc. In some cases the Intra frame is requested only by one of the receiving MREs, and not by other MREs that participate in the session and obtain the same quality level stream. An Intra frame is a video frame that was compressed relative to information that is contained only within the same frame and not relative to any other frame in the video sequence. An Inter frame is a video frame that was compressed relative to information that is contained within the same frame, and also relative to one or more other frame (reference frames) in the video sequence. An Inter frame can include a predictive frame (a P frame), and/or a bidirectional predictive frame (a B frame). In video conferencing, B frames are not typically used because of the latency they introduce. In the following description the term P frame is used as a representative term for an Inter frame.
Video streaming may involve lost packets, jumping forward while playing the video, or switching between streams of different qualities. In order to support those capabilities, video compression standards offer special frame types that are periodically placed along the streams. The first type of special frame is a Switching P frame (SP). An SP frame is similar to a P frame (using similar macroblock mode, and motion compensate prediction). However, SP frames allow identical frames to be reconstructed even when they are predicted using different reference frames. The second special frame type is referred as a secondary SP frame (SSP). The SSP frame uses special encoding. Regardless of which reference frames, macroblocks, or motion vectors were used for encoding the SSP frame, the decoding will always reconstruct the same picture. The 3rd type of special frame is Switching Intra frame (SI). SI framescan be seen as an Intra frame that identically reconstructs an SP frame. In the present disclosure, the terms encode and compress are used interchangeably.
SP, SSP, and SI frames and the use of these frames for switching between streams or recovery from packet loss is well known in the art of video streaming and will not be further discussed. A reader who wishes to learn more about those frames and the use of those frame is invited to read the H.264 AVC standard as well as “Advanced Bitstream Switching for Wireless Video Streaming,” a Diploma Thesis of Michael Walter (Nov. 26, 2004). Another article is “The SP- and SI-Frames Design for H.264/AVC,” written by Marta Karczewics et al. and published in IEEE Vol. 13, No. 7 (July 2003).
In response to an Intra request received from a requiring MRE targeted to a relevant MRE, an MRM relays the request to the relevant MRE. In response, the relevant MRE may send an Intra frame toward the MRM that relays the Intra frame to each MRE that is currently receiving the video stream from that relevant MRE, including those MREs that do not need and did not ask for an Intra frame. Intra frame coding efficiency is lower than Inter frame coding efficiency, requiring higher bandwidth for the same quality. Additionally, the encoding/decoding of an Intraframe takes longer time and requires more computing power than that of an Inter frame. Thus, sending the Intra frames to all MREs creates unnecessary load over the communication links and also increases the computing load in the receiving MREs as well as the transmitting MRE. Therefore, in order to maintain the bandwidth constraints, the Intra frame may be encoded in lower quality. Alternatively, the frame rate may be temporarily reduced during the transition period. Thus, in general Intra frames degrade the conferee's experience in that period of time.