As everyday applications and services migrate to Internet Protocol (IP) networks at a remarkable rate, and the variety of multimedia conferencing equipment continues to grow, more and more people rely on multimedia conferencing as an important communication method. Today, multimedia conferencing communication can be implemented using a plurality of conferencing techniques. A few examples of conferencing techniques include a legacy multimedia conferencing method, a media relay conferencing method, and a mesh conferencing method. In this disclosure, the terms multimedia conference, video conference (with or without content) and audio conference may be used interchangeably and the term video conference is used as a representative term of video, audio, and multimedia conferences.
A legacy multipoint conference between three or more participants requires a Multipoint Control Unit (MCU). Such an MCU is a conference controlling entity that is typically located in a node of a network or in a terminal that receives several channels from a plurality of endpoints. According to certain criteria, the legacy MCU processes audio and visual signals and distributes them to each of the participating endpoints via a set of connected channels. Examples of legacy MCUs include the RMX 2000®, which is available from Polycom, Inc. (RMX 2000 is a registered trademark of Polycom, Inc.) A terminal in the legacy-communication method, which may be referred to as a legacy endpoint (LEP), is an entity on the network, capable of providing real-time, two-way audio and/or audio visual communication with another LEP or with the MCU. A more thorough definition of an LEP and an MCU can be found in the International Telecommunication Union (“ITU”) standards, such as, but not limited to, the H.320, H.324, and H.323 standards, which can be found at the ITU Website, www.itu.int.
A common MCU, referred to also as a legacy MCU, may include a plurality of audio and video decoders, encoders, and media combiners (audio mixers and/or video image builders). The MCU may use a large amount of processing power to handle audio and video communication between a variable number of participants (LEPs). The communication can be based on a variety of communication protocols and compression standards and may involve different types of LEPs. The MCU may need to combine a plurality of input audio or video streams into at least one single output stream of audio or video, respectively, that is compatible with the properties of at least one conferee's LEP to which the output stream is being sent. The compressed audio streams received from the endpoints are decoded and can be analyzed to determine which audio streams will be selected for mixing into the single audio stream of the conference. The terms decode and decompress are used herein interchangeably.
A conference may have one or more video output streams wherein each output stream is associated with a layout. A layout defines the appearance of a conference on a display of one or more conferees that receive the stream. A layout may be divided into one or more segments where each segment may be associated with a video input stream that is sent by a certain conferee via the LEP. Each output stream may be constructed of several input streams, resulting in a continuous presence (CP) image. In a CP conference, a user at a remote terminal can simultaneously observe several other participants in the conference. Each participant may be displayed in a segment of the layout, and each segment may be the same size or a different size. The choice of the participants displayed and associated with the segments of the layout may vary among different conferees that participate in the same session.
The second type of communication method is Media Relay Conferencing (MRC). In MRC, a Media Relay MCU (MRM) receives one or more streams from each participating Media Relay Endpoint (MRE). The MRM relays to each participating endpoint a set of multiple media streams received from other endpoints in the conference. Each receiving endpoint uses the multiple streams to generate the video CP image according to a layout, as well as mixed audio of the conference. The CP video image and the mixed audio are played to the MRE's user. An MRE can be a terminal of a conferee in the session that has the ability to receive relayed media from an MRM and deliver compressed media according to instructions from an MRM. A reader who wishes to learn more about an MRC, MRM, or an MRE is invited to read U.S. Pat. Nos. 8,228,363 and 8,760,492, both of which are incorporated herein by reference in their entirety. As used herein, the term endpoint may represent either an LEP or an MRE.
In some MRC systems, a transmitting MRE sends its video image in two or more streams; each stream can be associated with different quality level. The qualities may differ in frame rate, resolution and/or signal to noise ratio (SNR), etc. In a similar way, each transmitting MRE may send its audio in two or more streams that may differ from each other by the compressing bit rate, for example. Such a system can use the plurality of streams to provide different segment sizes in the layouts, different resolutions used by each receiving endpoint, etc. Further, the plurality of streams can be used for overcoming packet loss.
MRC is becoming increasingly popular today. Many videoconferencing systems deliver a plurality of quality levels in parallel within one or more streams. For video, for example, the quality can be expressed in a number of domains, such as temporal domain (frames per second, for example), spatial domain (HD versus CIF, for example), and/or in quality (sharpness, for example). Video compression standards, for example, that can be used for multi quality streams are H.264 AVC, H.264 annex G (SVC), MPEG-4, etc. More information on compression standards such as H.264 can be found at the ITU Website www.itu.int, or at www.mpeg.org.
In the first two types of communication methods, the legacy MCU and the MRC, a central entity is needed to handle the signaling and the audio and video media streams (an MCU or an MRM, respectively). Each endpoint sends its media streams to an MCU or an MRM. The MCU or the MRM processes the media stream according to the type of the communication method being applied and transfers the relevant streams to receiving endpoints. The term MCU is used herein as a representative term for an MRM and a legacy MCU.
A third type of communication method can be used. The third method can be referred to as a mesh conferencing system (MCS). In an MCS, there is no central entity for handling the media streams. Instead, in an MCS a Roster List Server (RLS) can be used as the central signaling entity and clients can send and receive the media directly from each other. In an example of an MCS, a client can use a WebRTC application program interface (API). The WebRTC was drafted by the World-Wide-Web consortium (W3C) for facilitating browser-to-browser real-time communication (P2P) of audio, video, and data sharing. A common WebRTC may use a VP8 codec for video and an OPUS codec for audio. VP8 is a video compression format owned by Google Inc. Opus is a lossy audio codec developed by the Internet Engineering Task Force (IETF). The WebRTC and the compression format VP8 are currently supported by browser applications such as, but not limited to, Google CHROME® (CHROME is a registered trademark of Google Inc.); Mozilla FIREFOX® (FIREFOX is a registered trademark of Mozilla Foundation); OPERA® (OPERA is a registered trademark of Opera Software ASA). Some browsers may need a plug-in in order to use WebRTC and the VP8 codec. Other MCS may use other Web clients and other compression standards in order to deliver mesh conferencing services.
A common RLS can hold a directory of a plurality of virtual meeting rooms (VMR). Each VMR can represent a videoconferencing session and it may be associated with a VMR identification (VMRID) with or without a password. In some MCS, each VMR may have a different uniform resource locator (URL) or uniform resource identifier (URI). Further, a VMR may comprise a list of endpoints that are already connected to the VMR. In the list, each endpoint is associated with one or more URLs that allow other participants to contact the VMR. Each URL can be associated with a media type or signaling. In a common MCS, the RLS list, which is also referred as an RTS state table, is created in real time starting from the first conferee that calls the VMR and is updated each time a new conferee joins the VMR or a current conferee leaves the VMR. A non-limiting example of an RLS is www.Vline.com.
When a user wishes to participate in a mesh videoconferencing session, the user may use a Web client to contact a virtual meeting room in an RLS, using a browser application and clicking on the URL provided in the meeting invitation. The RLS may start an authentication process and, upon completion, an HTML5 file can be downloaded to the browser application. The browser application can parse the HTML5 file and download a list of URLs of the users that are already associated with that virtual meeting room. In addition, a JavaScript RLS Web client (RLSWC) can be deployed from the RLS to the requesting browser application such as Google Chrome, Mozilla Firebox, or Opera Mobile, for example. The RLSWC can comprise a logical module that is needed for establishing the real-time session. When the RLSWC is employed by a processor that runs the browser application, the processor can establish the signaling and control connections with the other browser applications and conduct the mesh videoconferencing.
In some cases, the RLS may also transfer a VMR state table to the new conferee. The VMR state table can include information on the peers that are already participating in the video session. Then, the new conferee needs to establish a videoconferencing session with each other conferee's endpoint by establishing a signaling and control connection. The system can be based on Session Initiation Protocol (SIP) or H.323, for example. Then each endpoint needs to establish one or more SRTP/IP and SRTCP/IP connections for sending its video image and audio stream to each of the other participating endpoints and for receiving video image and audio stream from each of the other participating endpoints. SRTP stands for Secure Real-Time Transport Protocol and SRTCP stands for Secure Real-Time Control Protocol. Each time a conferee leaves the session, the RLS can update the VMR state table accordingly. An updated copy of the VMR state table can be supplied to each of the currently connected conferees informing them on the latest change. In some cases, the entire updated VMR state table is sent. In other cases, only the changes are sent. Thus, in an MCS, no central entity receives or transmits media streams to and from the participating endpoints.
A reader who wishes to learn more about videoconferencing standards and protocols is invited to visit the ITU Website, www.itu.int, or the Internet-Engineering-Task Force (IETF) Website, www.ietforg. Legacy multipoint conference systems, MRCs, MCSs, MCUs, RLSs, LEPs, MREs, Web conferencing clients, and VMRs are well known to a person with ordinary skill in the art and have been described in many patents, patent applications, and technical books. As such, these will not be further described. The following are examples of patents and patent publications that describe videoconferencing systems: U.S. Pat. Nos. 6,496,216, 6,757,005, 7,174,365, 7,085,243, 8,411,595, 7,830,824, 7,542,068, 8,340,271, and 8,228,363; and U.S. Pat. Pub. No. 20140028788, and others.
A conventional MCS suffers from certain limitations. One such limitation centers on bandwidth. When three endpoints participate in a mesh videoconferencing session, each endpoint transmits two audio streams and two video streams, one set to each of the other endpoints and receives two audio and two video streams, one set from each of the other endpoints. In general, a full mesh conference with N participants requires N(N−1)/2 peer sessions per medium type. The N2 property quickly renders a full mesh conference impractical for anything exceeding a modest N. The required bandwidth for such amount of real time data may be close to the limits of the available bandwidth of an endpoint. Any additional conferee may exceed the available bandwidth. Thus, any additional request to join the session can be denied or the quality of the compression can be reduced, reducing the number of frames per second, sharpness, etc. This could result in seriously reduced quality of experience for such conferees.
Another limitation of a conventional MCS can be the endpoint capabilities. For example, a conventional MCS requires that all the endpoints be able to use the same compression format such as, but not limited to, VP8. If one of the endpoints cannot satisfy this requirement, that endpoint cannot send or receive video data directly from the other endpoints.