Online broadcasting of lectures and presentations, live or on demand, is increasingly popular in universities and corporations as a way of overcoming temporal and spatial constraints on live attendance. For instance, at Stanford University, lectures from over 50 courses are made available online every quarter. University of California at Berkeley has developed online learning programs with “Internet classrooms” for a variety of courses. Columbia University provides various degrees and certificate programs through its e-learning systems. These types of on-line learning systems typically employ an automated lecture capturing system and a web interface for watching seminars online. FIG. 1 shows a screen shot of one such web interface 10. On the left hand side, there is a display sector 12 showing a video stream generated by the automated lecture capturing system being employed at the lecture site. Typically, this display is an edited video switching among a speaker view, an audience view, a local display screen view and an overview of the lecture room. Presentation slides of the lecture are displayed on the right in a slide sector 14 of the interface 10. The automated lecture capturing systems can vary greatly in their makeup. However, a typical example would include several analog cameras. For example, two cameras could be mounted in the back of the lecture room for tracking the speaker. A microphone array/camera combo could be placed on the podium for finding and capturing the audience. In some capture systems, each camera is considered a virtual cameraman (VC). These VCs send their videos to a central virtual director (VD), which controls an analog video mixer to select one of the streams as output.
Despite their success, these automated lecture capturing systems have limitations. For example, it is difficult to transport the system to another lecture room. In addition, analog cameras not only require a lot of wiring work, but also need multiple computers to digitize and process the captured videos. These limitations are partly due to the need for two cameras to track the speaker in many existing capture systems. One of these cameras is a static camera for tracking the lecturer's movement. It has a wide horizontal field of view (FOV) and can cover the whole frontal area of the lecture room. The other camera is a pan/tilt/zoom (PTZ) camera for capturing images of the lecturer. Tracking results generated from the first camera are used to guide the movement of the second camera so as to keep the speaker at the center of the output video. This dual camera system can work well, however it tends to increase the cost and the wiring/hardware complexity.
It is noted that while the foregoing limitations in existing automated lecture capturing systems can be resolved by a particular implementation of a combined tracking system and process according to the present invention, this system and process is in no way limited to implementations that just solve any or all of the noted disadvantages. Rather, the present system and process has a much wider application as will become evident from the descriptions to follow.