1. Field of the Invention
The present invention relates to a video signal playback apparatus and method.
There is a growing amount of video signals available on the Internet and on a variety of storage media e.g. VHS cassettes or DVD (digital video discs). Furthermore, said video signals are provided by a huge number of telestations as an analog or digital video signal.
In general, a video signal is a rich multimodal information source containing speech, audio, text, colour patterns and shape of imaged objects and motion of these objects.
In the present document, a self-contained video signal belonging to one general subject matter is referred to as “video film”. For example, each single feature film and each single documentary film is referred to as video film.
Usually, each video film contains a plurality of self-contained activities (events). In this regard, only self-contained activities (events) having a certain minimum importance are accounted for. If the video film is a certain feature film, for example, said self-contained activities might be the different scenes of the feature film.
In the following, said self-contained activities (events) which are included in a certain video film and meet a minimum importance are called “contents”.
Said contents frequently comprise a “key event” which characterises the whole content. If a content of a feature film comprises a rendezvous of two persons, for example, the key event of said content will be the moment the persons actually meet and not the separate ways the persons might have to take to go to the location the rendezvous takes place.
Beginning from the time a video signal (comprising e.g. a movie) firstly was stored into a recording medium, there has been an increasing desire to decide whether a certain video signal stored in a recording medium contains interesting contents.
2. Discussion of the Background
There have been proposed several solutions that intend to support the user in taking this kind of decision:
Firstly, “trailers” are provided for a plurality of video films. Each trailer is a manually prepared summary of the contents of the respective video film. Normally, the trailer is produced by the provider or producer of the respective video film.
Since trailers are an advertisement with the intention to attract consumers, trailers are not neutral and therefore do not reliably indicate whether a certain video film contains interesting contents or not. Furthermore, trailers have to be produced manually which is very costly and cumbersomely. Therefore, there is not available a trailer for every video signal.
Secondly, a key frame based solution is disclosed by Michael G. CHRISTEL et al. in the paper “Adjustable Filmstrips and Skims as Abstractions for a Digital Video Library” which was issued during the IEEE Advances in Digital Libraries Conference, Baltimore, Md., May 19-21, 1999.
In general, key frame based tools communicate information about every content in a video film by displaying a plurality of key frames. Said key frames are individual frames of the respective video film that represent contents like scenes or shots. Said key frames might be displayed as a slide show or a story board (which is also called “film strip”).
It is a disadvantage with key frame based tools that they map the dynamic and dramaturgy of information comprised in a video signal to still pictures and thus in a static form. This comprises a high risk of loosing important information comprised in the respective video signal. If the key frame is showing the face of a person, for example, it can not be distinguished whether the person is just sitting around or involved in an important activity.
Thirdly, to solve the before mentioned problem, an automatic production of video skims that preserve the video's frame rate is additionally proposed by the above paper “Adjustable Filmstrips and Skims as Abstractions for a Digital Video Library”. A video skim is an portion of cohesive video signals which is extracted from the original video signals.
With video skims it is a disadvantage that each video film/video film clip is characterised by only one video skim. Thus, although video skims are a dynamic form of displaying information, there is still a high risk of loosing important information comprised in the original video signal. Furthermore, those parts of the video information comprised in the video signal that are selected for the video skim are played back in the original frame rate. Since the video skims need to have a certain minimum size relative to the size of the original video signal to avoid an excessive loss of information, the time required to view the video skim characterising the information comprised in a video signal still is very high. In consequence the potential compression rate of the video signal is relatively low.
Fourthly, the traditional video tape recorder Fast-Forward and/or Fast-Backward playback mode is frequently used to decide whether a certain video signal contains interesting contents or not.
In the Fast-Forward and/or Fast-Backward playback mode a video signal is played back in a higher frame rate that the frame rate for real time playback (normally 2-, 4-, or 8-times as fast as the original frame rate) in order to go quickly through the video signal and to preserve the dynamic flow of a story comprised in the video signal.
By reference to FIGS. 11 and 12 the traditional Fast-Forward and/or Fast-Backward playback mode is shortly discussed:
This technique was developed in analogue-type video tape recorders and conventionally is realised by scanning the video tape in a speed faster than normal speed as it is shown in FIG. 11. Thus, the rate of displayed information is raised. Applied to the digital world of modern digital video recorders this means that the frame rate is increased.
Alternatively, the chronological distance between two successive frames of a video signal might be increased while the rate of displayed information stays constant. In other words, frames (e.g. every n-th frame) are skipped by keeping the original frame rate as it is shown in FIG. 12.
In FIGS. 11, 12 succeeding frames of a video information comprised in a video signal are marked by different colours. In both figures, the frames of the original video signal played back in real time are shown on the left side of the respective figure whereas the information displayed in the Fast-Forward and/or Fast-Backward playback mode is shown on the right side of the respective figure.
It is a disadvantage, that this prior art method is very time consuming. For a fast playback of a typical movie comprising 100 minutes of video information in 8-times real time, for example, about 12.5 minutes are required to scan the video information comprised in the video signal.
This time can not be further reduced since there is a natural limit for the increased frame rate caused by the limited capability of humans:
If the frame rate is continuously increased, there is a maximum frame rate where the change of pictures of information comprised in a video signal becomes too fast for a human watching said video signal. Although this maximum frame rate differs for each individual person, such a maximum frame rate exists for everybody. It is assumed that said maximum frame rate is arround 32-times real time. Thus, the minimum amount of time which is theoretically necessary to scan the information comprised in a temporal amount of 100 minutes of video signals by playing back said video signal in 32-times real time is 3,125 minutes.
This necessarily results in a minimum amount of time that is needed to preview a video signal by using the traditional video tape recorder Fast-Forward and/or Fast-Backward playback mode.
Fifthly, a fast video play back mode is provided by the “Cue video” system of the International Business Machines Corporation. This fast video play back mode is composed of shots from the original video that are played back in different speeds depending on an activity (and especially motion) comprised in each shot.
It is a disadvantage with the “Clue video” system that all parts of a video signal are played back and thus the overall compression rate is limited. Therefore, a significant amount of time still is necessary to visually scan the information comprised in the video signal. In addition, complex analysis techniques are needed to detect motion in the information comprised in the video signal.
Sixthly, video signals stored to digital video discs frequently comprise chapters which have been added to the video signals during the production of the digital video disc to allow navigation through the video signals comprised in the disc. Said chapters normally allow identification of the story line and thus of the information comprised in the respective video signal, only. Especially, said chapters do not allow identification of individual contents (self-contained activities/events having a certain minimum importance) comprised in the video film different from the predefined chapters. Furthermore, said chapters are not neutral since they are provided by a provider or producer of the digital video disc.
The problem of the above prior art concepts is that a visual scan through video information comprised in a video signal either still requires a lot of time or that they are not neutral and therefore do not reliably indicate whether a certain video signal contains interesting contents.