Videoconferencing enables individuals located remote from each other to have face-to-face meetings on short notice using audio and video telecommunications. A videoconference may involve as few as two sites (point-to-point) or several sites (multi-point). A single participant may be located at a conferencing site or there may be several participants at a site, such as at a conference room. Videoconferencing may also be used to share documents, information, and the like.
Participants in a videoconference interact with participants at other sites via a videoconferencing endpoint (EP). An endpoint is a terminal on a network, capable of providing real-time, two-way audio/visual/data communication with other terminals or with a multipoint control unit (MCU, discussed in more detail below). An endpoint may provide speech only, speech and video, or speech, data and video communications, etc. A videoconferencing endpoint typically comprises a display unit on which video images from one or more remote sites may be displayed. Example endpoints include POLYCOM® VSX® and HDX® series, each available from Polycom, Inc. (POLYCOM, VSX, and HDX are registered trademarks of Polycom, Inc.). The videoconferencing endpoint sends audio, video, and/or data from a local site to the remote site(s) and displays video and/or data received from the remote site(s) on a screen.
Video images displayed on a screen at a videoconferencing endpoint may be arranged in a layout. The layout may include one or more segments for displaying video images. A segment is a portion of the screen of a receiving endpoint that is allocated to a video image received from one of the sites participating in the session. For example, in a videoconference between two participants, a segment may cover the entire display area of the screen of the local endpoint. Another example is a video conference between a local site and multiple remote sites where the videoconference is conducted in switching mode, such that video from only one other remote site is displayed at the local site at a single time and the displayed remote site may be switched, depending on the dynamics of the conference. In contrast, in a continuous presence (CP) conference, a conferee at a terminal may simultaneously observe several other participants' sites in the conference. Each site may be displayed in a different segment of the layout, where each segment may be the same size or a different size. The choice of the sites displayed and associated with the segments of the layout may vary among different conferees that participate in the same session. In a continuous presence (CP) layout, a received video image from a site may be scaled or cropped in order to fit a segment size.
An MCU may be used to manage a videoconference. Some MCUs are composed of two logical units: a media controller (MC) and a media processor (MP). A more thorough definition of an endpoint and an MCU may be found in the International Telecommunication Union (“ITU”) standards, including the H.320, H.324, and H.323 standards. Additional information regarding the ITU standards may be found at the ITU website www.itu.int.
To present a video image within a segment of a screen layout of a receiving endpoint, the entire received video image may be manipulated by the MCU, including scaling or cropping the video image. An MCU may crop lines or columns from one or more edges of a received conferee video image in order to fit it to the area of a segment in the layout of the videoconferencing image. Another cropping technique may crop the edges of the received image according to a region of interest in the image, as disclosed in U.S. patent application Ser. No. 11/751,558, the entire contents of which are incorporated herein by reference.
In a videoconferencing session, the size of a segment in a layout may be defined according to a layout selected for the session. For example, in a 2×2 layout each segment may be substantially a quarter of the display. In a 2×2 layout, if five sites are taking part in a session, conferees at each site typically may see the other four sites.
In a CP videoconferencing session, the association between sites and segments may be dynamically changed according to the activity taking part in the conference. In some layouts, one of the segments may be allocated to a current speaker, and other segments may be allocated to other sites, sites that were selected as presented conferees. The current speaker is typically selected according to certain criteria, such as having the highest audio signal strength during a certain percentage of a monitoring period. The other sites (in the other segments) may include the image of the conferee that was the previous speaker, sites with audio energy above a certain threshold, certain conferees required by management decisions to be visible, etc.
In some cases a plurality of sites may receive a similar layout from an MCU. Sites that are not presented may receive one of the layouts that are sent toward one of the presented conferees, for example. In a conventional CP conference, each layout is associated with an output port of an MCU, for example.
A typical output port may comprise a CP image builder and an encoder. A typical CP image builder may obtain decoded video images from each one of the presented sites. The CP image builder may scale and/or crop the decoded video images to a required size of a segment in which each image will be presented. The CP image builder may further write the scaled image in a CP frame memory in a location that is associated with the location of the segment in the layout. When the CP frame memory is completed with all the presented images located in their associated segments, then the CP image may be read from the CP frame memory by the encoder.
The encoder may encode the CP image. The encoded and/or compressed CP video image may be sent toward the endpoint of the relevant conferee. A frame memory module may employ two or more frame memories, for example, a currently encoded frame memory and a next frame memory. The frame memory module may alternately store and output video of consecutive frames. Output ports of an MCU are well known in the art and are described in a numerous patents and patent applications, including U.S. Pat. No. 6,300,973, the content of which is incorporated herein by reference in its entirety for all purposes.
An output port typically consumes substantial computational resources, especially when the output port is associated with a high definition (HD) endpoint that displays high-resolution video images at a high frame rate. In typical MCUs, the resources needed for the output ports may limit the capacity of the MCU and have a significant influence on the cost of a typical MCU.
In order to solve the capacity/cost issue, some conventional MCUs offer a conference on port (COP) option, in which a single output port is allocated to a CP conference. In a conference on port MCU, all of the sites that participate in the session receive the same CP video image.