Prior art systems for browsing a news video typically rely on detecting transitions of news presenters to locate different topics or news stories. If the transitions are marked in the video, then a user can quickly skip from topic to topic until a desired topic is located.
Transition detection is usually done by applying high-level heuristics to text extracted from the news video. The text can be extracted from closed caption information, embedded captions, a speech recognition system, or combinations thereof, see Hanjalic et al., “Dancers: Delft advanced news retrieval system,” IS&T/SPIE Electronic Imaging 2001: Storage and retrieval for Media Databases, 2001, and Jasinschi et al., “Integrated multimedia processing for topic segmentation and classification,” ICIP-2001, pp. 366-369, 2001.
Presenter detection can also be done from low-level audio and visual features, such as image color, motion, and texture. For example, portions of the audio signal are first clustered and classified as speech or non-speech. The speech portions are used to train a Gaussian mixture model (GMM) for each speaker. Then, the speech portions can be segmented according to the different GMMS to detect the various presenters, see Wang et al., “Multimedia Content Analysis,” IEEE Signal Processing Magazine, November 2000. Such techniques are often computationally intensive and do not make use of domain knowledge.
Another motion-based video browsing system relies on the availability of a topic list for the news video, along with the starting and ending frame numbers of the different topics, see Divakaran et al., “Content Based Browsing System for Personal Video Recorders,” IEEE International Conference on Consumer Electronics (ICCE), June 2002. The primary advantage of that system is that it is computationally inexpensive because it operates in the compressed domain. If video segments are obtained from the topic list, then visual summaries can be generated. Otherwise, the video can be partitioned into equal sized segments before summarization. However, the later approach is inconsistent with the semantic segmentation of the content, and hence, inconvenient for the user.
Therefore, there is a need for a system that can reliably locate topics of interest in a news video. Then, the video can be segmented and summarized to facilitate browsing.