In distributed presentations, there are multiple video cameras capturing a person, people or objects, such as a speaker or audience, across distributed sites where the presentation is being viewed. The video streams output by these cameras are presented on display screens at the various sites. However, how these streams are shown is a challenging problem. For example, showing all video streams on a screen at a site would reduce the screen space allocated for each video, and it would waste the screen space on streams not of interest to the audience at a site. Further, if particular ones of the available video streams are presented at any one time, rather than all, there is a question of what streams to show and in what configuration on the screen. There are existing presentation systems that automatically switch cameras views based on a predefined automata. However, this output is not customizable by viewers and typically not amenable to a distributed presentation setting.
One application of a distributed presentation scheme is a distributed classroom system. While it is not intended to limit the application of the present invention to a distributed classroom system, a description of these systems provides insight to the issues involved. Distributed classroom systems allow students to take classes remotely via computer networks, leading to wider and better educational opportunities. In distributed classroom applications, multiple classrooms are connected with a computer network. The lecturer stands in one of the classrooms (lecturer classroom), which may also have an audience, and other classrooms contain only an audience (audience classroom). The typical audio/video equipment configuration for such a distributed classroom setting includes one or more video cameras at each site that capture views of the audience. In addition, the lecturer classroom also has one or more video cameras for capturing views of the lecturer. To display the video streams, each lecturer or audience classroom is also equipped with one or more display screens, such as projector screens or plasma displays placed at the front of the classroom. These display screens typically show lecture slides, as well as views of the audiences not at the site and the lecturer.
When there are multiple participating sites as described above, multiple camera streams are available for display at each site. Hence there is a question as to which video streams to present on the local display screen and how to arrange the screen layout. Many systems have been proposed to manage the video streams intelligently. Generally, they can be classified into three categories. First, there are human operated systems. In these systems a human moderator is employed to conduct video switching between audience and lecturer videos streams, typically based on well-known cinematography principles. While these types of systems can provide high quality video switching and can allow customization of the presentation display at each site, they are expensive to deploy due to the manpower required.
There are also so-called “show-all” systems. These systems adopt the simple principal of presenting all available camera streams to the viewers at each site. The various video streams are typically presented in different sectors in a default split screen layout. Although this approach allows the viewer to decide what streams they want to watch, it has limitations. For example, if there are many streams being displayed, each sector's display resolution may be quite limited and the audiences sitting at the back of the room may not be able to see them clearly. In addition, because audiences normally focus on only one of the video streams, such as the video of the lecturer, most of the other streams are wasting valuable screen real estate and can be distracting to the audience.
The third category involves the use of automatic camera switching systems. Such systems automatically switch between multiple camera streams based on finite state automata. Each camera is represented by a state of the automata, and the switching of cameras is represented by transitions between the states. These transitions can be based on timers, such as “after showing the audience camera for 20 seconds, switch to the lecturer camera”. They may also be based on events, such as “when there is some audience member starting to speak, switch to the audience camera showing that audience member”. Such an approach is capable of producing high quality video productions by applying well-known cinematography rules automatically. However, to incorporate the cinematography rules in the video production, the automata often becomes very complicated. For example, some systems employ videography rules which are manually written as a cascade of switching rules in a script file. This is error prone, and changing the type or number of camera views presented would require the whole script to be rewritten. These existing systems also output a single video stream. However, it is desirable to be able to present multiple videos on the screen at the same time, using video effects such as Picture-in-Picture or split screen. Finally, the automatic camera switching scheme used by the existing systems prevents users (i.e., viewers or the system administrator) from customizing the video output based on their preferences.