A presentation (such as, for example, a meeting, a talk, a seminar, a lecture and classroom instruction) is an important tool whereby knowledge transfer, teaching and learning can occur. A presentation, which can be any setting whereby an exchange of information occurs, typically includes at least one lecturer and an audience. The audience may be present at the presentation or viewing the presentation at a remote location. In addition, the audience view the presentation in real time as it occurs (“live”) or at a later time (“on demand”). In order to accommodate the constraints of people in both time and space, capturing presentations for both live and on-demand viewing is becoming increasingly popular in university and corporate settings.
In order to facilitate viewing of a presentation both live and on-demand, the presentation first must be captured. Once the presentation is captured, the presentation can be made available. For example, one popular way to view a presentation is by viewing over a computer network (or “online” viewing). Online viewing of a presentation enables a person to view the presentation at a time and location that is convenient for the person. Online viewing of presentations is becoming more feasible and popular due to continuous improvements in computer network infrastructure and streaming-media technologies.
There are at least two problems, however, associate with capturing presentations. Once problem is it is expensive to outfit lecture rooms with the equipment (such as cameras) needed to capture the presentation. Equipment cost is a one-time cost and tends to become less expensive as market demand increases. A second, and bigger problem is the high labor costs associated with having people capture the presentation. This labor cost is a recurring cost and one is of the main prohibitions to the capturing of presentations.
One way to breach this cost barrier is to build automated camera management systems, where little or no human intervention is needed. Even if a product of such a camera management system does not match the quality of professional videographers (who can still be used for the most important broadcasts), the camera management system allows the capture of presentations that otherwise would be available only to physically present audiences.
There are a few existing automated video systems and research prototypes. However, each of these systems has one or more of the following limitations:                Lack of a complete system: Some existing systems provide only isolated components, such as, for example, a head tracking module. These existing systems, however, lack many other components, and thus are not a complete system.        Use of invasive sensors to track lecturers: Some existing automated video systems require the use of obtrusive sensors to track the lecturer. These sensors must be worn by the lecturer and may be bothersome to the lecturer and interfere with the lecturer's freedom of movement.        Directing rules that work only in a specific situation: Some existing systems have a set of directing rules that are valid only in a specific configuration. For example, a set of directing rules may be specific to a large auditorium but fail in a small conference room. Or the directing rules may be specific to a certain number of cameras but fail if more cameras are added. One disadvantage of having these specific directing rules is that the system has little flexibility and general applicability.        
In addition, various directing rules developed in the film industry and graphics avatar system currently are available. However, these systems are not suitable for use in automatically capturing presentations, in part for the following reasons. In the film industry or graphics avatar systems, a director has multiple physically and virtually movable cameras that can shoot a scene from almost any angle and direction. On the other hand, an automated video system is highly constrained by the flexibility of the types of camera shots. Therefore, many of the rules developed in the film industry cannot be used in automated video systems because of the highly-constrained nature of cameras in an automated video system.
Accordingly, there exists a need for an automated video system that alleviates human labor costs associated with capturing a presentation. At least two major components are needed in such a system:                1. A technology component: The hardware (cameras, microphones, and computers that control them) and software to track and frame presenters when they move around and point, and to detect and frame audience-members who ask questions.        2. An aesthetic component: The rules and idioms that human videographers follow to make the video visually engaging. Audiences have expectations based on viewing presentations captured by professional videographers. An automated video system should meet such expectations.        
These components are inter-related. For example, aesthetic judgments will vary with the hardware and software available, and the resulting rules must in turn be represented in software and hardware. Further, what is needed is an automated video system and method that captures presentations in a professional and high-quality manner using the similar rules used by professional human directors and cinematographers. Moreover, what is needed is an automated video system and method that tracks a presenter in a presentation without the need for the presenter to wear bothersome and restricting sensors. What also is needed is an automated video system and method that is a complete system and can be used in a wide variety of situations based on the size of a room and the number of cameras available.