Today's video conference systems provide an interactive and effective means for remote users to participate in a meeting. Such systems typically involve the simultaneous and real-time transmission of audio and video streams that are associated with participating or active users. Furthermore, identification of active talkers from remote locations is desirable for natural communications between parties. However, providing, setting up, and maintaining video conferencing systems which allow the user to easily see and identify the active talker are often expensive and complex to implement, requiring significant user or technician effort to configure and maintain.
Some video conference systems do provide an immersive video environment where the active talker can be easily identified, but these systems also require a dedicated room that has high bandwidth requirements. Due to bandwidth limitations, many video conferencing systems have a single outbound audio and video stream from each end-point. When there are multiple people engaged in a live meeting in a room with a single out-bound connection (as one node in a multi-party video conferencing scenario), the remote participants may only see a wide-angle view of the meeting room. Because of the bandwidth limitations, this view may not provide enough facial detail of the participants in order to have their expressions easily recognizable so as provide effective communication.