The ability to browse through a large amount of video material to find relevant clips of interest is extremely important in many video applications. In interactive TV and pay-per-view systems, customers like to see sections of programs before renting. In digital video library, it is important to provide functionality for users to quickly browse through results returned from queries and to navigate through large collections of materials. The sequential nature of video does not lend itself to easy searching and non-sequential random access operations, both of which are crucial to efficient and effective use of video material. In addition, while an entire video sequence can be transferred over a network to a client's computer for viewing, the inherently large datasize associated with video requires that a lot of bandwidth of the network and time be used for the transfer.
Video browsing is thus a method of displaying and presenting video in a simple and intuitive manner so that a user can easily go through large collection of video, as he would flip through books.
Two terms are defined here: a video shot and a video collection, which will be frequently used in this description:
A shot is a single sequence of video images, like a motion picture or a television program, recorded by one video capture medium without interruption. It is the most fundamental unit of video production.
A collection is a group of similar video shots, where similarity is defined in terms of visual characteristics. For example, in a news broadcast, a collection can be all the shots of a particular news anchor person.
Prior art displaying of video browsing include:
1. That of transferring the entire video, i.e., a video program, from a server to a client computer or loading the entire video from a local storage for sequential display on the computer display. The sequential display means one gets to see the video one frame after the other in specified sequence. Some display programs also provide VCR functions like fast-forward and fast-rewind. PA1 2. That of using keyframes. There are two ways of doing this: PA1 3. That of using graph-based presentation, as disclosed in the work of Yeung et al., M. M. Yeung, B. L. Yeo, W. Wolf, and B. Liu, "Video browsing using clustering and scene transitions on compressed sequences," in Multimedia Computing and Networking 1995, vol. SPIE 2417, pp. 399-413, Feb. 1995). In this presentation, an image icon represents a collection of similar video shots and a directed edge represents the flow of temporal information.
2A. Dividing the video into equal length segments, and for each segment, choosing one frame, say the first, for display. If there are N segments, then there are N keyframes being displayed. Examples of this are disclosed in Mills et al. (M. Mills and J. Cohen and Y. Y. Wong, "A magnifier tool for video data," in Proceedings of ACM Computer Human Interface (CHI), pp.93-98, May 1992). PA2 2B. Dividing the video into shots. For each shot, choosing one or more keyframes, for display. Example work is that of Zhang at el. (H. J. Zhang and C. Y. Low and S. W. Smoliar, "Video Parsing and Browsing using Compressed Data", Multimedia Tools and Applications, pp. 89-111, March 1995). Different numbers of keyframes are selected for each shot based on the activities in the shot. Many keyframes will be selected for shots with significant temporal activities, and few for shots with static contents.
These references are incorporated by reference in their entirety.