1. Field
The disclosure relates to videoconferencing and more specifically to processing a data stream from a multipoint videoconferencing endpoint. A method of associating a tag with an endpoint or video source in a multipoint videoconference and processing data from the tagged endpoint or video source in accordance with the tag is disclosed.
2. Description of the Related Art
Two viewing modes are commonly utilized for multi-point videoconferencing: (1) continuous presence mode (CP) (a.k.a. “Hollywood Squares”), wherein images of multiple conference endpoints are displayed simultaneously, with each image occupying a separate region of the display area; and (2) full screen, wherein the image of a single endpoint or video source, usually corresponding to the currently or most recently speaking participant, is displayed using all or most of the display area at the receiving site, resulting in a relatively larger and more detailed picture, on which each person at the far-end site can be seen more clearly. FIG. 1 illustrates an example of CP mode layout wherein four conferees can be simultaneously presented in sub-regions 101-104. Videoconferencing modes are described in detail in ITU-T Recommendation H.243. Each of these modes has inherent advantages and limitations. For example, CP works well when each site has only one or a few (e.g., up to three) people present because the smaller images in each sub-region of the CP display are still large and detailed enough to provide adequate representation of facial expressions, body language, etc. of the participants. However, when one or more sites have many people (e.g., six or more people around a conference table), or if a site is presenting written or graphical data, such as a marker board, chart, graph, computer display, etc., CP mode is less effective because the small sub-regions of the layout do not allow sufficient size and detail for good viewing of these sites. For example, the image of each person is so small, and/or has such lack of detail that facial expressions, body language etc. are difficult or impossible to distinguish. Such sites are more suitable to be displayed in full screen so that the people and/or data at the site can be seen in better detail. One disadvantage of full screen is that the other participating sites can not be seen simultaneously, so there is no opportunity to observe the reaction of the other participants to what is being seen.
Existing multipoint videoconferencing solutions are typically “modal,” i.e., a conference can either be conducted in CP mode or in full screen, but not generally in a mixture of both. There are in the art examples of switching between full screen mode and CP mode in the same conference depending on the dynamics of the conference, for example, using CP mode if there is discussion involving multiple endpoints but using full screen if only one endpoint is active. One such solution is described in U.S. Pat. No. 6,744,460, the entire contents of which are hereby incorporated herein by reference.
Commonly assigned U.S. Pat. Nos. 6,704,769 and 7,139,807, the entire contents of which are hereby incorporated by reference, describe labeling media streams in a videoconference with a role that describe the function or purpose of the stream, such as “people” or “content.” A policy manager is provided for managing roles, so that the media streams may be more effectively presented to participants based on the role of the stream.
While processing a media stream based on its role is an improvement over the typical modal presentation available for videoconferencing, a further improvement would be provided by further defining an optimal or preferred display mode for individual endpoints and for automatically and dynamically switching to the appropriate mode depending on which endpoint(s) are to be displayed. For example, two different streams having the same role may be most optimally displayed in different modes.