As traffic over Internet Protocol (IP) networks continues its rapid growth, with the growth of the variety of multimedia conferencing equipment, more and more people use multimedia conferencing as their communication tool. Today the multimedia conferencing communication can be carried over two types of communication methods, the legacy multimedia communication method and the new technique of media relayed communication method. As used herein, the terms: multimedia conference, video conference and audio conference may be understood as interchangeable, and the term video conference can be used as a representative term of them.
A legacy multipoint conference between three or more participants requires a Multipoint Control Unit (MCU). An MCU is a conference controlling entity that is typically located in a node of a network or in a terminal that receives several channels from endpoints. According to certain criteria, the MCU processes audio and visual signals and distributes them to a set of connected channels. Examples of MCUs include the MGC-100 and RMX® 2000, which are available from Polycom, Inc. (RMX is a registered trademark of Polycom, Inc.) A terminal, which may be referred to as a legacy endpoint (LEP), is an entity on the network, capable of providing real-time, two-way audio and/or audio-visual communication with another LEP or with the MCU. A more thorough definition of an LEP and an MCU can be found in the International Telecommunication Union (“ITU”) standards, including the H.320, H.324, and H.323 standards, which can be found at the ITU website ww.itu.int.
A common MCU, referred to also as a legacy MCU, may include a plurality of audio and video decoders, encoders, and media combiners (audio mixers and/or video image builders). As used herein, the terms common MCU and legacy MCU can be considered interchangeable. The MCU may use a large amount of processing power to handle audio and video communication between a variable number of participants (LEPs). The communication can be based on a variety of communication protocols and compression standards and may involve different types of LEPs. The MCU may need to combine a plurality of input audio or video streams into at least one single output stream of audio or video, respectively, that is compatible with the properties of at least one conferee's LEP to which the output stream is being sent. The compressed audio streams received from the endpoints are decoded and can be analyzed to determine which audio streams will be selected for mixing into the single audio stream of the conference. As used herein, the terms decode and decompress should be understood as interchangeable.
A conference may have one or more video output streams wherein each output stream is associated with a layout. A layout defines the appearance of a conference on a display of one or more conferees that receive the stream. A layout may be divided into one or more segments where each segment may be associated with a video input stream that is sent by a certain conferee (endpoint). Each output stream may be constructed of several input streams, resulting in a continuous presence (CP) conference. In a CP conference, a user at a remote terminal can observe, simultaneously, several other participants in the conference. Each participant may be displayed in a segment of the layout, where each segment may be the same size or a different size. The choice of the participants displayed and associated with the segments of the layout may vary among different conferees that participate in the same session.
A common MCU may need to decode each input video stream into uncompressed video of a full frame; manage the plurality of uncompressed video streams that are associated with the conferences; and compose and\or manage a plurality of output streams in which each output stream may be associated with a conferee or a certain layout. The output stream may be generated by a video output port associated with the MCU. A video output port may comprise a layout builder and an encoder. The layout builder may collect and scale the different uncompressed video frames from selected conferees into their final size and place them into their segment in the layout. Thereafter, the video of the composed video frame is encoded by the encoder and sent to the appropriate endpoints. Consequently, processing and managing a plurality of videoconferences require heavy and expensive computational resources and therefore an MCU is typically an expensive and rather complex product. Common MCUs are described in several patents and patent applications, for example, U.S. Pat. Nos. 6,300,973, 6,496,216, 5,600,646, or 5,838,664, the contents of which are incorporated herein by reference. These patents disclose the operation of a video unit in an MCU that may be used to generate the video output stream for a CP conference.
The growing trend of using video conferencing creates a need for low cost MCUs that will enable one to conduct a plurality of conferencing sessions having composed CP video images. This need leads to the new technique of Media Relay Conferencing (MRC).
In MRC, a Media Relay MCU (MRM) receives one or more streams from each participating Media Relay Endpoint (MRE). The MRM relays to each participating endpoint a set of multiple media streams received from other endpoints in the conference. Each receiving endpoint uses the multiple streams to generate the video CP image, according to a layout, as well as mixed audio of the conference. The CP video image and the mixed audio are played to the MRE's user. An MRE can be a terminal of a conferee in the session that has the ability to receive relayed media from an MRM and deliver compressed media according to instructions from an MRM. As used herein, the term endpoint may represent either an MRE or an LEP.
In some MRC systems, a transmitting MRE sends its video image in two or more streams; each of which can be associated with a different quality level. Such a system can use the plurality of streams to provide different segment sizes in the layouts, different resolution used by each receiving endpoint, etc. Further, the plurality of streams can be used for overcoming packet loss. The quality levels may differ in frame rate, resolution and/or signal to noise ratio (SNR), etc.
Today, MRC is becoming more and more popular. Further, more and more sources of video conferencing systems deliver a plurality of streams in parallel wherein the streams differ from each other by the quality of the compressed video. The quality level can be expressed in a number of domains, such as temporal domain (frames for second, for example), spatial domain (HD versus CIF, for example), and/or in quality (sharpness, for example). Video compression standards that can be used for multi-quality streams are H.264 AVC, H.264 annex G (SVC), MPEG-4, etc. More information on compression standards such as H.264 can be found at the ITU website www.itu.int, or at www.mpeg.org.
Common video compression methods involve using Intra and Inter frames. An Intra frame is a video frame that is compressed relative to information that is contained only within the same frame and not relative to any other frame in the video sequence. An Inter frame is a video frame that was compressed relative to information that is contained within the same frame, and also relative to one or more other frames in the video sequence.
Some of the multimedia multipoint conferencing sessions may involve some conferees having LEPs and some conferees having MREs. Such conferencing sessions need a gateway, an MCU, and an MRM.
A gateway can be adapted to control multipoint multimedia conferences involving one or more LEPs and one or more MREs. The gateway can be installed in an intermediate node between an MRM and one or more LEPs. In alternate embodiments, the gateway can be embedded within the MRM. In yet other embodiments, the gateway can be added to an LEP or to a common MCU that controls the LEP.
In the direction from the MRM to the LEP, a gateway can handle the plurality of audio streams that were relayed from the MRM, arrange them, and decode and mix the audio streams. The mixed audio stream can be encoded according to the audio compression standard used by the destination LEP and be sent to the LEP. In a similar manner, the received one or more compressed video streams can be arranged by the gateway, decoded, and composed into a CP image. The CP image can be encoded according to the video compression standard used by the destination LEP and be sent to the LEP.
In the other direction, from the LEP to the MRM, a gateway can be adapted to decode the video stream, scale the video stream (if needed) to one or more sizes, and compress each one of the scaled video images according to the compression standard used by the MREs that participate in the session. The compressed video stream that complies with the need of the MREs is sent toward the MRM. The compressed audio stream that is received from an LEP can be decoded and its energy level can be determined. The decoded audio can be compressed according to the compression standard used by the MREs, an indication on the audio energy can also be added, and the compressed audio that complies with the requirements of the MRM can be sent toward the MRM.
Control and signaling information received from the MRM, such as the one or more IDs assigned to an LEP, the layout assigned to the LEP, the selected audio streams to be mixed, or the presented streams and their segments, can be processed and used by the gateway. Other signaling and control can be translated and be sent to the LEP, for example call setup instructions. A reader who wishes to learn more about MRMs, MREs, and gateways between MRC and legacy conferencing systems is invited to read U.S. Pat. No. 8,228,363, the entire content of which is incorporated herein by reference.
Using three intermediate nodes (an MRM, a gateway, and an MCU) between an MRE and an LEP that participate in the same multimedia multipoint conferencing session results in negative impact on the experience of the conferees using either type of endpoint, MRE or LEP. Using a gateway in between adds latency to the traffic and reduces the quality of the media. The gateway typically adds additional decoding/scaling/encoding operations. Decoding, scaling, and encoding operations add delay and reduce the quality of the media because of the lossy nature of the decompression and compression standards.
Further, in a common MCU, decoders and encoders are typically allocated to a conferee as long as the conferee is connected to a session, independent of whether the conferee video image or voice is selected to be presented or heard by the other conferees. This allocation and the use of three intermediate devices for handling a conference session between an MRE and an LEP consume expensive resources.
The above-described deficiencies of the current situation do not intend to limit the scope of the inventive concepts of the present disclosure in any manner. The deficiencies are presented for illustration only.