1. Field of the Invention
This invention relates to digital video imaging systems, and more particularly to a method and apparatus for dynamically interacting with and viewing content-based video images in a multi-perspective video imaging system.
2. Description of Related Art
The traditional model of television and video uses a single video stream that is transmitted to a passive viewer. Under this model, the passive viewer has no control over the objects or viewing perspectives from which to view the event of interest. Rather, a video editor or broadcast video producer dictates the composition of the video production viewed by the passive viewer. In contrast to the wide range of viewing perspectives and object selection available to a viewer when the viewer is actually present at an event of interest, the television or video viewer is constrained to view objects that are selected by the video producer. In addition, the television viewer must view the objects selected by the television producer from the viewing perspectives dictated by the producer.
In some cases, this is an acceptable arrangement, especially when a television or video viewer has little or no interest in the event and therefore has no preference regarding the perspectives or objects under view. However, there are several applications where greater viewer control over viewing perspectives and object selection is desirable. For example, when viewing a sporting event such as a basketball game, some viewers may wish to follow the flow of the game by viewing the control of the basketball, while others may wish to watch the play "away from the ball". Also, even though several viewers may wish to follow the ball, some may want to view the game from the visitors' side, others may want to view the game from the home players' side, and still others may want to view the play from above the basket. Similarly, when watching a political debate, some viewers may wish to view the speaker, while others may wish to view reactions of an opposing candidate or audience. Suffice it to say that the viewing perspectives and object selection will vary to the same extent that the personalities and viewing tastes of the various viewers vary.
Therefore, the need exists for providing a system and method for allowing viewers to selectively and dynamically view video information from a variety of viewing perspectives. In addition, a need exists for a system and method that allows a user/viewer to interface with a video database and select video information based upon the content of the video. That is, it is desirable to allow users/viewers of video information to interact with a video database system in such a way that they can select video data for viewing based upon some user (or system) specified criteria. It is therefore desirable to provide a system and method that permits viewers of video and television to select a particular viewing perspective from which perspective the video scene is henceforth presented. In addition, it is desirable to provide a method and apparatus that allows a viewer to alternatively select a particular object to be viewed (which may be a dynamically moving object) or an event in a real world scene that is of particular interest. As the scene develops its presentation to the viewer will prominently feature the selected object or the selected event. Accordingly, it is desirable to provide a multi-perspective viewer that provides "content-based" interactivity to a user/viewer. Such a viewer method and apparatus provides interactivity between the user/viewer and the scene to be viewed.
It is also desirable to provide a viewer method and apparatus that facilitates greater flexibility and interactivity between a viewer and recorded video information that also supports the playback and editing of the recorded video information. As noted above, in conventional video, viewers are substantially passive. All that viewers are allowed to do is to control the flow of video by pressing buttons such as play, pause, fast forward or reverse. These controls essentially provide the passive viewer only one choice for a particular segment of recorded video information: the viewer can either see the video (albeit at a controllable rate), or skip it. However, due to time and bandwidth restraints (especially when the video information is transmitted over a computer network such as the well-known Internet), it is desirable to provide the viewer improved and more flexible control over the video content to be viewed. For example, in a sports context, a particular viewer may only be interested in activities by a particular player, or in unusual or extraordinary plays (such as a fumble, three-point shot, goal, etc.). Such events are commonly referred to as "highlights".
By providing "content-based" interactivity to a video database, a viewer could query the system to view only those plays or events that satisfy a particular query. For example, a viewer could query such a system to view all of the home runs hit by a particular player during a particular time period. Thus, rather than sifting through (fast forwarding or reversing) a large portion of video information to find an event of interest, viewers could use a content-based video query system to find the events of interest. This not only saves the user/viewer time and energy, but it could also vastly reduce the amount of bandwidth that would be required when transmitting video information over a bandwidth constrained network. Rather than requiring the transmission of unnecessary video content, only the video events of interest are transmitted over the transmission network.
There are several prior art references that are related to the present multi-perspective viewer having content-based interactivity. For example, U.S. Pat. No. 5,109,425 to Lawton for a "Method And Apparatus For Predicting The Direction Of Movement In Machine Vision" concerns the detection of motion in and by a computer-simulated cortical network, particularly for the motion of a mobile rover. Although motion detection may be used to track objects under view and to build a video database for viewing by a user/viewer, the present invention is not limited to using the method taught by Lawton. Rather, several well-known motion detection methods can be used with the present invention without departing from the scope of the present claims. The video system adapted for use with the viewer of the present invention uses multiple two-dimensional video images from each of multiple stationary cameras as are assembled into a three-dimensional video image database. Once the multiple images of the video system are available for object, and for object tracking, it is relatively easy to detect motion in the video system.
Similarly, U.S. Pat. No. 5,170,440 to Cox for "Perceptual Grouping By Multiple Hypothesis Probabilistic Data Association" describes the use of a computer vision algorithm. However, in contrast to the system taught by Cox, the video system adapted for use with the present viewer invention avails itself of much more a priori information than the single-point machine vision system taught by Cox. More specifically, the video system used with the present viewer uses multiple two-dimensional video images from multiple stationary cameras. These multiple two-dimensional images are assembled into a three-dimensional video image database.
Other prior art that are related to the present invention include prior art related to the coordinate transformation of video image data. For example, U.S. Pat. No. 5,259,037 to Plunk for "Automated Video Imagery Database Generation Using Photogrammetry" discusses the conversion of forward-looking video or motion picture imagery into a database particularly to support image generation of a "top down" view. The present invention does not require a method as sophisticated as that taught by Plunk. In general, the necessary image transformations of the present invention are not plagued by dynamic considerations (other than camera pan and zoom). U.S. Pat. No. 5,237,648 to Cohen for an "Apparatus And Method For Editing A Video Recording By Selecting And Displaying Video Clips" shows and discusses some of the concerns, and desired displays, presented to a human video editor. These concerns are addressed by the present multi-perspective viewer having content-based interactivity.
Arguably, the closest prior art reference to the present invention is U.S. Pat. No. 5,729,471 to Jain et al. for "Machine Dynamic Selection of one Video Camera/Image of a Scene from Multiple Video Cameras/Images of the Scene in Accordance with a Particular Perspective on the Scene, an Object in the Scene, or an Event in the Scene", hereby incorporated by reference herein, hereinafter referred to as the '471 patent. The '471 patent teaches a Multiple Perspective Interactive (MPI) video system that provides a video viewer improved control over the viewing of video information. Using the MPI video system, video images of a scene are selected in response to a viewer-selected (i) spatial perspective on the scene, (ii) static or dynamic object appearing in the scene, or (iii) event depicted in the scene. In accordance with the MPI system taught by Jain in the '471 patent, multiple video cameras, each at a different spatial location, produce multiple two-dimensional video images of the real-world scene, each at a different spatial perspective. Objects of interest in the scene are identified and classified by computer in these two-dimensional images. The two-dimensional images of the scene, and accompanying information, are then combined in a computer into a three-dimensional video database, or model, of the scene. The computer also receives user/viewer-specified criterion relative to which criterion the user/viewer wishes to view the scene.
From the (i) model and (ii) the criterion, the computer produces a particular two-dimensional image of the scene that is in "best" accordance with the user/viewer-specified criterion. This particular two-dimensional image of the scene is then displayed on a video display to be viewed by the user. From its knowledge of the scene and of the objects and the events therein, the computer may also answer user/viewer-posed questions regarding the scene and its objects and events.
The present invention uses systems and sub-systems that are similar in concept to those taught by the '471 patent. For example, the present viewer interacts with a multi-perspective video system and video database that is similar in concept to that taught in the '471 patent. However, the content of the video database contemplated for use with the present viewer invention is much more extensive than that of the '471 patent. Also, the present invention is adapted for use with an inventive "capturing" sub-system that automatically creates a content-based and annotated database that can be accessed by the present viewer. In addition, the present inventive multi-perspective viewer is more interactive and has much greater flexibility than the user interface taught or suggested by the '471 patent.
The system taught by the '471 patent suggests a user interface that allows a viewer/user to specify a specific perspective from which to view a scene. In addition, the user can specify that he or she wishes to view or track a particular object or person in a scene. Also, the user can request that the system display a particularly interesting video event (such as a fumble or interception when the video content being viewed is an American football game). Significantly, the user interface taught by the '471 patent contemplates interaction with a video database that uses a structure that is developed prior to the occurrence of the video event. The video database structure is static and uses a priori knowledge of the location and environment in which the video event occurs. The video database remains static throughout the video program and consequently limits the flexibility and adaptability of the viewer/user interface.
In contrast, the video database developed for use with the present invention is much more dynamic. The database is automatically constructed using multiple multi-media data types. The structure of the database is defined initially based upon a priori information about the location or video program. However, the database structure is dynamically built by parsing through the structure and updating the database as the multi-media program progresses. Because the database created for use with the present viewer invention is derived from the specific multi-media program under view, and because it uses multiple multi-media data types, the database is much richer and therefore offers much more flexible and interesting viewing opportunities. No such database is taught or suggested by the '471 patent. Rather, the '471 patent teaches a database based upon "live" video streams obtained from a plurality of cameras offering a plurality of viewing perspectives of a program. Although the '471 patent teaches an interesting multi-perspective system that is interfaced by a rudimentary user interface, it does not contemplate synchronizing multiple multi-media data types (i.e., video, audio, data and other information).
Accordingly, a need exits for an integrated multi-perspective viewing method and apparatus having content-based interactivity with a user/viewer. Such a multi-perspective viewer should allow a user/viewer to easily and flexibly interact with a fully linked video, audio, and data database in an intuitive and straightforward manner. The viewer should be capable of use either directly (that is, in direct communication with) a multiple perspective interactive multi-media system, or remotely via the worldwide web or some other communications network. The present invention provides such an integrated multi-perspective viewing method and apparatus.