There are many methods available for groups of individuals to engage in conferencing. One common method, videoconferencing, involves the use of video equipment, such as cameras, microphones, displays and speakers. The equipment may be used to divide a conference site into segments associated with one or more conference participants. In particular, a camera and a microphone may generate a video signal and an audio signal, respectively, associated with a particular segment. When a remote location has more segments than the local location has displays, individuals at the local location may have an artificial and unrealistic experience during the videoconference. Correspondingly, individuals at the local location may have a diminished experience if the aggregate number of segments at several remote locations outnumbers the monitors at the local location due to continuous switching among the signals from the cameras.
To address such problems, certain systems employ various metrics to determine how or where a video image is displayed. For example, some videoconference systems will only present one video signal at a time on a monitor. In these videoconference systems the video signal that is presented within the display is usually determined by voice activity (e.g., the last person to talk is the one that is presented). Some videoconference systems support multiple monitors where the displayed video signals may be switched according to site. More particularly, each of the local monitors may display video signals associated with the site of the current active speaker. Thus, if the active speaker is in New York, then each of the local monitors will display video signals associated with the segments at the New York site. Site switching may be inefficient and visually unappealing as each remote site will typically only have a single active speaker. Thus, continuous switching between displayed sites may detract from a user's conferencing experience. In other systems, the displayed video signal may switch by segment. In these systems, each segment may be separately analyzed to determine an active speaker. The set of displayed video signals may include the active video signals for each respective segment irrespective of the segment's site location. Switching by segment may detract from a local user's conference experience as the user may lose track of the site associated with a particular speaker.