Video conferences can be configured in a variety of manners, including switched or transcoded video and the view of the conference provided may be single-participant or continuous presence.
In a switched video scenario, a multi-participant video conference can be conducted by switching the video from a primary participant to all other participants, with the designated primary participant able to change during the course of the conference. Typically, the primary participant is the active speaker in the conference as determined by analysis of the contributed audio, and may change during the course of the conference. The primary participant may be determined in other ways besides determining the participant with the maximum audio level, such as by a fixed conference role or by token passing (the primary participant passes a token to another participant, who then becomes the primary participant).
In the switched video scenario, the primary participant receives switched video from one of the other participants so that they are not viewing themselves while speaking, since this may be distracting and may expose the latency of communication between participants. To avoid these effects, self-view suppression is desirable.
In a switched video scenario with a single video stream, the active participant is the only conference participant visible to others, and the conference lacks a group feel, or even a visual representation of who else is actually in the conference. A more satisfactory conference experience is achieved with a continuous presence configuration, in which a conference view is composed for each secondary participant, showing the primary participant and others, but excluding themselves. The continuous presence experience may be composed locally at an endpoint that receives multiple video streams (one stream per displayed participant) but this requires a capability in the receiving endpoint of decoding multiple video streams and composing the decoded video. Alternatively, a transcoding multipoint control unit (MCU) may decode individual streams from participants and compose the resulting video streams into a single view of the conference suitable for display to a specific conference participant (not showing that participant), doing this multiple times for multiple conference participants. This view is then encoded uniquely for that participant alone, providing a dedicated view of the conference. This approach based on the “transcoding” of compressed video streams may employ more image processing and video encoding resources than the switched video scenario described above, but completely decouples each participant's conference experience from all others. It also allows for simple endpoints that handle only a single video stream to receive a complex composed experience of the video conference, concentrating processing resources in the conference center.