1. Field of the Invention
This invention relates to multi-media information systems, and more particularly to a method and apparatus for dynamically interacting with and perceiving content-based multi-media data in a multi-media presentation system.
2. Description of Related Art
The traditional model of television and radio uses multiple continuous data streams or frequencies that are transmitted to a receiver. Under this model a user can only perceive one data stream at a time. To find programs of interest a user must manually change video channels. This activity is referred to as “channel surfing” in the modern vernacular. Program listings such as television or radio station guides aid users to find programs of interest. However, a typical program listing only contains cursory information such as the program title, the length of the program, and a brief description thereof.
In some cases, typical program listings are adequate because a user is only interested in one program. However, program listings are inadequate in cases where users are interested in several programs that run concurrently. More specifically, a user may only be interested in certain content or “events” contained within several multi-media programs. For example, a user may want to listen to three “live” college basketball games, game 1, game 2, and game 3, which all start at a particular time. In this example the user is primarily interested in the entire content of game 1. However, the user is also interested in some of the events that may occur during the other two games such as whenever the lead changes. Thus, the user would like to be alerted when the lead changes in either game 2 or game 3 so that the user can change the channel and listen to that game at the time of interest (i.e., when the lead changes). In the traditional model, a user would need to “channel surf” (i.e., constantly switch channels between the 3 games) between the three games in hope of viewing the program content of interest. Thus, the user would most likely miss a large part of the content-based multi-media events that the user wished to view during the three programs. These content-based multi-media events may be very specific. For example, the user may wish to view a 3-point attempt shot by player number five with one minute left in the game when player number five's team is behind by 2 points. The content-based events desired will vary depending upon the personalities and tastes of the various users.
Therefore, a need exists for a system and method that allows users to selectively and dynamically perceive multiple multi-media events based on the content of the events. It is desirable to allow users to interface with a multi-media database and to select conditions for perceiving the multi-media data types within the database based on some user (or system) specified criteria. Also, it is desirable to assist users in dynamically and flexibly varying selection conditions.
In addition to desiring to perceive only certain specific content-based events, users may desire to perceive only certain multi-media data types from a multi-media event. A multi-media event can be represented by a set of associated and corresponding multi-media data types. Multi-media data types include video, static video images, audio, text, statistical, graphic representations, graphic overlays, other data, or any combination of these data types. Users may want to select to perceive or view only certain multi-media data types at different points during the event. For example, suppose a user is interested in a basketball game having video data, audio data, closed-captioning text data, and various statistical data. The user may want to listen to the first half of the basketball game, and view only the closed-captioned text and statistical data of the second half. In the traditional model, a conventional media player such as a radio or television presents only limited multi-media data types in a continuous-time information stream. Thus, the user would need several media players to perceive only the selected multi-media data types. As with the content-based events, the multi-media data types vary depending upon the personalities and tastes of the various users.
Therefore, a need exists for an intelligent console method and apparatus that facilitates greater flexibility and interactivity with users. More specifically, it is desirable to present selected content-based multi-media events in a manner selected by a user.
Conventional methods allow for the perception of entire multi-media programs in a continuous stream of data. These continuous streams of data can be archived on any well-known devices such as videocassette recorders (VCR), digital videodiscs (DVD), laser discs, read/write compact discs, audio tape recorders, digital audiotapes (DAT), and transcription devices. These devices allow playback based mainly on time or track indices. Disadvantageously, users are only allowed to control the flow of data by pressing buttons such as play, pause, fast forward or reverse. These controls essentially provide the user only one choice for a particular segment of a recorded multi-media program: the viewer can either perceive the data (albeit at a controllable rate), or skip, it. However, due to time and bandwidth restraints (especially when a video or audio data type is transmitted over a computer network such as the well-known Internet), it is desirable to provide multi-media users improved and flexible control over the multi-media content to be perceived. For example, in a sports context, a particular user may only be interested in activities performed by a particular player, or in unusual or extraordinary plays (such as a three-point shot, fumble, goal, etc.). Such events are commonly referred to as “highlights”.
By providing “content-based” interactivity to a multi-media database, users can query the system to perceive only those plays or events that satisfy a particular query. For example, a user can query such a system to view the video and statistical data of all of the home runs hit by a particular player during a particular time period. Thus, rather than sifting through (by fast forwarding or reversing for example) a large portion of video and statistical information to find an event of interest, users can use a flexible and dynamic content-based query system to find events of interest. This not only saves the user time and energy, but it could also vastly reduce the amount of bandwidth required when transmitting multi-media data over a bandwidth constrained network. Rather than requiring the transmission of unnecessary data content, only events of interest and their selected and associated data types are transmitted over the transmission network. For example, when transmitting over the well-known Internet the invention is particularly useful because the amount of bandwidth available to the user is limited. The content-based multi-media database reduces the amount of bandwidth required during transmission because only the multi-media data of interest to the user is transmitted.
The prior art has yet to teach or suggest such a flexible, dynamic and content-based interactive multi-media system. However, some prior art teachings are remotely related to the present invention. For example, U.S. Pat. No. 5,109,425 to Lawton for a “Method And Apparatus for Predicting the Direction of Movement in Machine Vision” teaches the detection of motion in and by a computer-simulated cortical network, particularly for the motion of a mobile rover. Although motion detection may be used to track objects under view and to build a video database for viewing by a user/viewer, the present invention is not limited to using the motion detection method taught by Lawton. Rather, a multiple multi-media database can be used with the present invention without departing from the scope of the present claims. The video database of Lawton is limited to video images.
Similarly, U.S. Pat. No. 5,170,440 to Cox for “Perceptual Grouping by Multiple Hypothesis Probabilistic Data Association” describes the use of a computer vision algorithm. However, in contrast to the system taught by Cox, the intelligent console system adapted for use with the present invention selects content based on user desires. Also, the system taught by Cox is limited to video images. In contrast, the present invention can be used with multiple multi-media data types and multiple events within a multi-media program.
Other prior art relate to the coordinate transformation of video image data. For example, U.S. Pat. No. 5,259,037 to Plunk for “Automated Video Imagery Database Generation Using Photogrammetry” describes the conversion of forward-looking video or motion picture imagery into a database particularly to support image generation of a “top down” view. U.S. Pat. No. 5,237,648 to Cohen for an “Apparatus And Method for Editing A Video Recording by Selecting and Displaying Video Clips” shows and describes some of the concerns, and desired displays, presented to a human video editor. Disadvantageously, the systems taught by Plunk and Cohen have rudimentary and limited data types. In contrast, the present invention can be used with multiple multi-media data types and multiple events within a multi-media program.
Arguably, the most relevant prior art to the present invention is U.S. Pat. No. 5,729,471 to Jain et al. for “Machine Dynamic Selection of one Video Camera/Image of a Scene from Multiple Video Cameras/Images of the Scene in Accordance with a Particular Perspective on the Scene, an Object in the Scene, or an Event in the Scene”, (hereinafter referred to as the '471 patent, and hereby incorporated herein for its teachings on multi-media video systems). The '471 patent teaches a Multiple Perspective Interactive (MPI) video system that provides a video viewer improved control over the viewing of video information. Using the MPI video system, video images of a scene are selected in response to a viewer-selected (i) spatial perspective on the scene, (ii) static or dynamic object appearing in the scene, or (iii) event depicted in the scene. In accordance with the MPI system taught by Jain in the '471 patent, multiple video cameras, each at a different spatial location, produce multiple two-dimensional video images of the real-world scene, each at a different spatial perspective. Objects of interest in the scene are identified and classified by computer in these two-dimensional images. The two-dimensional images of the scene, and accompanying information, are then combined in a computer into a three-dimensional video database, or model, of the scene. The computer also receives a user/viewer-specified criterion relative to which criterion the user/viewer wishes to view the scene.
From the (i) model and (ii) the criterion, the computer produces a particular two-dimensional image of the scene that is in “best” accordance with the user/viewer-specified criterion. This particular two-dimensional image of the scene is then displayed on a video display to be viewed by the user. From its knowledge of the scene and of the objects and the events therein, the computer may also answer user/viewer-posed questions regarding the scene and its objects and events.
The present invention uses systems and sub-systems that are similar in concept to those taught by the '471 patent. For example, the present intelligent console interacts with a database that is similar in concept to that taught in the '471 patent. However, the content of the multi-media database contemplated for use with the present intelligent console invention is much more extensive than that of the '471 patent. Also, the present invention is adapted for use with a logical database. The database automatically creates a content-based and annotated multi-media database that is interacted with by the present intelligent console. In addition, the present inventive intelligent console is more interactive and has improved flexibility as compared to the user interface taught or suggested by the '471 patent.
The system taught by the '471 patent suggests a user interface that allows a user/viewer to specify a specific perspective from which to view a scene. In addition, the user can specify that he or she wishes to view or track a particular object or person in a scene. Also, the user can request that the system display a particularly interesting video event (such as a fumble or interception when the video content being viewed is an American football game). Significantly, the user interface taught by the '471 patent contemplates interaction with a video database that uses a structure that is developed prior to the occurrence of the video event. The video database structure is static and uses a priori knowledge of the location and environment in which the video event occurs. The video database remains static throughout the video program and consequently limits the flexibility and adaptability of the user/viewer interface.
In contrast, the multi-media database developed for use with the present invention is much more dynamic. The database is automatically constructed using multiple multi-media data types. The structure of the database is defined initially based upon a priori information about all multi-media events of interest. However, the database structure is dynamically built by parsing through the structure and updating the database as all of the multi-media events develop. Consequently, the present intelligent console invention has increased flexibility and adaptability and is richer and more diverse than the prior art user interfaces.
The need exists for a system and method for selectively and dynamically accessing multiple multi-media events based on the content of the event. The need exists for allowing users to interface with a multi-media database and select conditions for perceiving multi-media data types within the database based on user (or system) specified criteria. In addition, a need exists for a method and system that allows users to dynamically change the selection of any multiple content-based multi-media event. Also, a need exists for providing users greater flexibility and interactivity with a content-based multi-media system.
It is therefore desirable to provide a system and method that permits users of simultaneous multi-media programs the selection of multiple content-based multi-media events and facilitates alerting users when a selected content-based multi-media event occurs. It is also desirable to provide an intelligent console method and apparatus that facilitates greater flexibility and interactivity with the user in the presentation of various multi-media data types.
Accordingly, it is desirable to provide a multi-media console that provides “content-based” interactivity to a user. Such a console method and apparatus preferably provides interactivity between the user and the multiple multi-media data types that represent various events in a multi-media program. Additionally, it is desirable to provide a method and apparatus that facilitates greater flexibility and interactivity between a user and recorded multi-media programs. The present invention provides such an intelligent console method and apparatus.