Individuals and organizations are rapidly accumulating large collections of digital content, including text, audio, graphics, animated graphics and full-motion video. This content may be presented individually or combined in a wide variety of different forms, including documents, presentations, music, still photographs, commercial videos, home movies, and meta data describing one or more associated digital content files. As these collections grow in number and diversity, individuals and organizations increasingly will require systems and methods for organizing and browsing the digital content in their collections. To meet this need, a variety of different systems and methods for browsing selected kinds of digital content have been proposed.
For example, storyboard browsing has been developed for browsing full-motion video content. In accordance with this technique, video information is condensed into meaningful representative snapshots and corresponding audio content. One known video browser of this type divides a video sequence into equal length segments and denotes the first frame of each segment as its key frame. Another known video browser of this type stacks every frame of the sequence and provides the user with information regarding the camera and object motions.
Content-based video browsing techniques also have been proposed. In these techniques, a long video sequence typically is classified into story units based on video content. In some approaches, scene change detection (also called temporal segmentation of video) is used to give an indication of when a new shot starts and ends. Scene change detection algorithms, such as scene transition detection algorithms based on DCT (Discrete Cosine Transform) coefficients of an encoded image, and algorithms that are configured to identify both abrupt and gradual scene transitions using the DCT coefficients of an encoded video sequence are known in the art.
In one video browsing approach, Rframes (representative frames) are used to organize the visual contents of video clips. Rframes may be grouped according to various criteria to aid the user in identifying the desired material. In this approach, the user may select a key frame, and the system then uses various criteria to search for similar key frames and present them to the user as a group. The user may search representative frames from the groups, rather than the complete set of key frames, to identify scenes of interest. Language-based models have been used to match incoming video sequences with the expected grammatical elements of a news broadcast. In addition, a priori models of the expected content of a video clip have been used to parse the clip.
In another approach, U.S. Pat. No. 5,821,945 has proposed a technique for extracting a hierarchical decomposition of a complex video selection for video browsing purposes. This technique combines visual and temporal information to capture the important relations within a scene and between scenes in a video, thus allowing the analysis of the underlying story structure with no a priori knowledge of the content. A general model of a hierarchical scene transition graph is applied to an implementation for browsing. Video shots are first identified and a collection of key frames is used to represent each video segment. These collections then are classified according to gross visual information. A platform is built on which the video is presented as directed graphs to the user, with each category of video shots represented by a node and each edge denoting a temporal relationship between categories. The analysis and processing of video is carried out directly on the compressed videos.
A variety of different techniques that allow media files to be searched through associated annotations also have been proposed. For example, U.S. Pat. No. 6,332,144 has proposed a technique in accordance with which audio/video media is processed to generate annotations that are stored in an index server. A user may browse through a collection of audio/video media by submitting queries to the index server, In response to such queries, the index server transmits to a librarian client each matching annotation and a media identification number associated with each matching annotation. The librarian client transmits to the user the URL (uniform resource locator) of the digital representation from which each matching annotation was generated and an object identification number associated with each matching annotation. The URL may specify the location of all or a portion of a media file.
Methods for transmitting video information over a network have been proposed. For example, in some approaches, an entire video shot (i.e., a single, complete sequence of video images) may be downloaded by a client for browsing. In another approach, one or more static images representative of a shot may be downloaded by a client for browsing. U.S. Pat. No. 5,864,366 has proposed a video information transmission scheme in accordance with which similar video shots of a video file (e.g., a commercial or a news broadcast) are grouped into one or more collections and a subset of frames from each shot is selected for transmission to a user over a network. The selected subset of frames may be downloaded by a user so that each collection may be displayed at a client terminal at the same time. In this way, a dynamic summary of the video information represented by the collections may be presented to a user while satisfying network bandwidth requirements.