When a user reproduces a video content, and views and listens to the same, it is important to index the video to enhance convenience, and hence a technology which enables identification of time instants of corner boundaries or corners is required.
In the video indexing, the method shown below is considered. First of all, the video image is divided into video image segments which is referred to as “shot” at timings when a scene is switched to another scene taken by another camera (when the camera angle is changed) and timings when persons, substances or scenes in the video are changed as break points. Then, similar shots are put together into one group, and information on the kinds of shots which appear and the timings when the shots appear is presented to the user.
Here, assuming that the kinds of the shots appeared are different before and after the corner boundaries, estimation of the time instants of the corner boundaries is enabled from the presented shot information and furthermore, recognition of similar corners is enabled the kinds of shots which constitute the corners.
A method of presenting a plurality of shots included in the same corner all together is also considered. In JP-A-2005-130416, a method of processing the tendency of appearance of shots in time series in order to determine the range of shots to be clustered is described.
On the other hand, in the audio indexing, processing is carried out by the unit of utterance divided at moments when the speaker (sound) is switched. Therefore, assuming that the kinds (speaking person, configuration of casts) of utterance appeared are different before and after the corner boundaries, estimation of the time instants of the corner boundaries and recognition of the similar corners are enabled by using the utterance information in the same manner as the case of using the shot information.
However, under the circumstance in which the information on the kinds of shots which appear and the timings when the shots appear is presented to the user, when the number of kinds is increased, the user needs to confirm disappearance of a lot more kinds of shots in the time series and estimate the time instants of corner boundaries of the same.
The user also needs to confirm appearance of a lot more kinds of shots among the corners to determine whether they are similar corners or not.
In the case of putting a plurality of shots included in the same corner together on the basis of the tendency of appearance of shots, when the number of kinds is increased, the user needs to realize the states of appearance for a lot more kinds of shots. Therefore, the information on the possibility of appearance is hidden depending on the kind of the shot. The hidden shot information may contribute to the continuity of the corners, which may result in detection of wrong time instants as the corner boundaries.
Therefore, the more the kinds of the shots/utterances are, the more the understanding of the tendency of appearance become difficult, and hence the estimation of the time instants of the corner boundaries and determination whether or not they are similar corners are liable to err.