1. Field of the Invention
The present invention relates to a multimedia data browsing system, and more particularly to a method and apparatus for controlling reproduction of video contents to allow a user to readily move a play position to a desired scene within a brief period of time and reproduce video contents from the moved play position.
2. Description of the Related Art
With the advance of mass media and the simplified creation processes of multimedia contents, ordinary persons have accessed copious media information.
As volume of multimedia contents is increased, there has been a need for an automation system to sort out data desired by users, and researches are in active progress for schemes to meet such a need.
With the development of digital technologies, video contents are on a trend of being digitally stored and distributed. The popularization of digital broadcasting will accelerate the digitalization of such media.
A high speed browsing technique for a part of such video contents desired by a user enables sorting and reproduction of a user-desired part of the video contents.
For example, one user may desire to view only a sports associated part of news video contents and another user may desire to view only a stock market associated part of the news video contents.
Yet another user may desire to view only a specific scene of a sports program or show program where a specific character appears.
A variety of studies are in active progress to meet such various desires of users.
Owing to results of such studies, users can search/filter and browse only a desired part of desired video contents at a desired time.
The most basic techniques for a nonlinear video browsing and search are shot segmentation and shot clustering, both of which are the kernel in analyzing video contents.
In this regard, up to now, many studies have been concentrated on the shot segmentation technique and research results in the shot clustering technique have been continuously published.
It can be seen from various studies that the shot segmentation can be automated and most algorithms associated therewith can be implemented with a high accuracy of up to 90% or more.
Also, the shot clustering technique can be automated with a high level of accuracy by applying it suitably to a program genre on the basis of a detected characteristic event or general characteristics of the shot.
In general terms, video contents are logically divided into several story units.
These story units are typically referred to as scenes or events.
For example, a gunfight scene, conversation scene, etc. may correspond to the story unit.
These scenes are composed of a connection of several sub-scenes or shots.
A shot is a sequence of video frames acquired from one camera without interruption, which is the most basic unit for video analysis or construction.
The shot segmentation signifies a technique for dividing video contents into individual shots, and the shot clustering signifies a process of reconstructing the individual shots into a logical scene unit on the basis of their characteristics to detect a logical story structure of the video contents.
FIG. 1 illustrates the shot segmentation and shot clustering processes.
Generally, most shot segmentation algorithms are based on such a characteristic that image/motion/audio similarities are present in the same shot and image/motion/audio non-similarities are measured between two different shots, and most shot clustering algorithms are based on such a characteristic that shots with similar characteristics are detected again within a certain period of time.
Various video indexing techniques have recently been studied for the purpose of finding a desired scene in a digital video.
For example, a research has been done for developing an interface capable of detecting shots as physical edit units and scenes as logical semantic units of a video stream, summarizing the entire contents of the video stream on the basis of key frames representative of the detected units and allowing a user to select a desired one of the key frames so as to select a desired play position.
This interface is utilized as a tool for enabling a user to move a play position to a desired position in a one-step manner.
FIG. 2 shows an example of a key frame-based video browsing interface.
Through the use of the interface shown in FIG. 2, a user can move a play position to a desired position in a one-step manner by selecting a desired key frame.
In a key frame-based video navigation, however, it is very important to control the number of key frames.
In other words, the provision of too many key frames necessitates a large number of inputs from a user during video navigation, and the provision of too few key frames makes it difficult to move a play position to a position actually desired by the user in a one-step manner.
Moreover, because this type of interface requires the user's frequent inputs, the user cannot easily access the interface if he/she uses a simple input unit environment such as a TV environment rather than a computer environment, or a terminal with a limited screen size, such as a PDA.
Unlikely from an analog video, a digital video can be prevented from deterioration in its picture quality in fast forward/fast rewind modes.
A method generally used for high-speed video reproduction is a frame rate increasing method for increasing the number of frames to be decoded per unit time and partly displaying only some of the resulting frames, or a frame skipping method for decoding frames while skipping over a certain portion thereof, and displaying the resulting frames.
However, the frame rate increasing method has a disadvantage in that the maximum frame rate is influenced by performance of a terminal device and a bit rate of the original stream. In this connection, the frame skipping method is generally used in fast forward/fast rewind modes of a digital video.
The fast forward/fast rewind functions of the frame skipping method are disadvantageous in that a still image is displayed for a lengthy period of time or similar scenes are repetitively reproduced because they do not utilize structural information of video contents.
While watching a video or TV, a user may often desire to skip over specific video contents or unconcerned video contents such as advertisements.
In order to meet such desires, a time offset-based forward or reverse skipping method has been proposed by Tivo or ReplayTV.
Using this skipping method, the user can move a play position to a desired segment by skipping over an undesired segment.
However, a forward/reverse skipping method implemented in an existing set-top box does not consider semantic/structural information of video contents. For this reason, a user may be required to provide several inputs in order to move a play position to a desired position. Furthermore, it is actually difficult to move the play position to the desired position.