Videoconferencing enables individuals located remotely from each other to conduct a face-to-face meeting. Videoconferencing may be executed by using audio and video telecommunications. A videoconference may be between as few as two sites (point-to-point), or between several sites (multi-point). A conference site may include a single participant (user, conferee) or several participants (users, conferees). Videoconferencing may also be used to share documents, presentations, information, and the like.
Participants may take part in a videoconference via a videoconferencing endpoint (EP), for example. An endpoint (EP) may be a terminal on a network, for example. An endpoint may be capable of providing real-time, two-way, audio/visual/data communication with other terminals and/or with a multipoint control unit (MCU). An endpoint (EP) may provide information/data in different forms, including audio; audio and video; data, audio, and video; etc. The terms “terminal,” “site,” and “endpoint” may be used interchangeably. In the present disclosure, the term endpoint may be used as a representative term for above group.
An endpoint may comprise a display unit (screen), upon which video images from one or more remote sites may be displayed. Example endpoints include POLYCOM® VSX® and HDX® series endpoints, each available from Polycom, Inc. (POLYCOM, VSX, and HDX are registered trademarks of Polycom, Inc.) A videoconferencing endpoint may send audio, video, and/or data from a local site to one or more remote sites, and display video and/or data received from the remote site(s) on its screen (display unit).
Video images displayed on a screen at an endpoint may be displayed in an arranged layout. A layout may include one or more segments for displaying video images. A segment may be a predefined portion of a screen of a receiving endpoint that may be allocated to a video image received from one of the sites participating in the videoconferencing session. In a videoconference between two participants, a segment may cover the entire display area of the screens of the endpoints. In each site, the segment may display the video image received from the other site.
An example of a video display mode in a videoconference between a local site and multiple remote sites may be a switching mode. A switching mode may be such that video/data from only one of the remote sites is displayed on the local site's screen at a time. The displayed video may be switched to video received from another site depending on the dynamics of the conference.
In contrast to the switching mode, in a continuous presence (CP) conference, a conferee (participant) at a local endpoint may simultaneously observe several other conferees from different endpoints participating in the videoconference. Each site may be displayed in a different segment of the layout, which is displayed on the local screen. The segments may be the same size or of different sizes. The combinations of the sites displayed on a screen and their association to the segments of the layout may vary among the different sites that participate in the same session. Furthermore, in a continuous presence layout, a received video image from a site may be scaled, up or down, and/or cropped in order to fit its allocated segment size. It should be noted that the terms “conferee,” “user,” and “participant” may be used interchangeably. In the present disclosure, the term conferee may be used as a representative term for above group.
An MCU may be used to manage a videoconference. An MCU is a conference controlling entity that is typically located in a node of a network or in a terminal that receives several channels from endpoints and, according to certain criteria, processes audio and/or visual signals and distributes them to a set of connected channels.
Example MCUs include the MGC-100 and RMX 2000®, available from Polycom Inc. (RMX 2000 is a registered trademark of Polycom, Inc.). Some MCUs may be composed of two logical units: a media controller (MC) and a media processor (MP). A more thorough definition of an endpoint and an MCU may be found in the International Telecommunication Union (“ITU”) standards, including the H.320, H.324, and H.323 standards. Additional information regarding video conferencing standards and protocols such as ITU standards or Session Initiation Protocol (SIP) may be found at the ITU website www.itu.int or in Engineering Task Force (IETF) website www.ietf.org, respectively.
Other video conferencing systems may use a Media Relay Conferencing system, (MRC). In MRC a Media Relay MCU (MRM) receives one or more streams from each participating Media Relay Endpoint (MRE). The MRM relays to each participating endpoint a set of multiple video streams received from other endpoints in the conference. Each receiving endpoint uses the multiple streams to generate the CP video image according to a layout. The CP video image is presented to the MRE's user. An MRE can be a terminal of a conferee in the session which has the ability to receive relayed media from an MRM and deliver compressed media according to instructions from an MRM. MRMs are described in more detail in U.S. Patent Publication No. 2010/0194847, which is incorporated herein by reference in its entirety for all purposes. For purposes of the present disclosure, the terms endpoint and MRE may be used interchangeably.
In some MRC systems, a transmitting MRE sends its video image in two or more layers, levels, of quality. In some systems the two or more layers are carried over a single stream. In other MRC systems, each layer is associated with a different stream. Those systems can provide different window sizes in the layouts, different resolutions used by each receiving endpoint, different frame rate, etc. Furthermore, the plurality of layers can be used for overcoming packet loss. The qualities may differ in frame rate, resolution and/or signal to noise ratio (SNR), etc.
Throughout this disclosure the term video streaming represents any transmission of compressed media (e.g., audio and/or video) in multimedia conferencing sessions, media streaming, or any application using transfer of compressed multimedia streams. The media having been compressed by a scalable-coding encoder. Further, the transmitted compressed media may contain a plurality of layers with the layers differing from each other in quality of the media. The different layers may be handled differently by disclosed embodiments. Also, the term Scalable-Coding (SC) as used herein represents an example of multi-layer media coding.
Video streaming is becoming more and more popular. Further, more and more sources of video streaming as well as video conferencing system deliver a plurality of layers, wherein the layers differ from each other by the quality of the compressed video. The quality can be expressed in number of domains, such as time domain (frames per second, for example), spatial domain (high definition (HD) or common intermediate format (CIF), for example), and/or in quality (sharpness, for example). Video compression standards that are used for video streaming and multi-quality layers include H.264 AVC, H.264 annex G, MPEG-4, etc. Those compression standards can be referred as SC standards. More information on compression standards such as H.264 can be found at the ITU website www.itu.int, or at www.mpeg.org.
Some video compression techniques use two types of frames, an Intra frame and an Inter frame. An Intra frame is a video frame that was compressed relative to information that is contained only within the same frame and not relative to any other frame in the video sequence. An Inter frame is a video frame that was compressed relative to information that is contained within the same frame, and also relative to one or more other frame (reference frames) in the video sequence. An Inter frame can include a predictive frame (a P frame), and/or a bidirectional predictive frame (a B frame). In video conferencing, B frames are not typically used because of the latency they introduce. In the following description an Inter frame is used as a representative term for the term P frame.
The media (e.g., audio and video) of common video conferencing session, which is carried over an Internet Protocol (IP) network, uses Real-time Transport Protocol (RTP) as the transport protocol of the media packets. The RTP protocol is used in conjunction with Real-Time Control Protocol (RTCP). RTCP is used to monitor transmission statistics and quality of service (QoS) and aids synchronization of multiple streams. In addition RTP packets are carried over UDP/IP. It is well known in the art that UDP/IP is a connectionless protocol, which is not reliable and suffers from packet loss. As one of the measures for identifying packet loss, a common RTP processor, at a source of a video conferencing stream, adds a sequence number to each one of the media packets before transmitting them toward their destination.
At the destination of the compressed video stream, an RTP processor sorts the received packets according to their sequence number and delivers the compressed media toward a relevant decoder. In order to overcome packet loss, the RTP processor at both end of the connection may use different forward error correction techniques. Further, the video encoder/decoder at both ends of the connection uses different recovery methods to overcome packet loss. Yet another recovery method can comprise retransmission request for one or more missing packet that are sent toward the source of the stream.