1. Field of the Invention
The present invention relates to an information processing apparatus and method, and a program. In particular, the present invention relates to an information processing apparatus and method, and a program which make it possible to easily obtain a digest in which, for example, scenes that are of interest for a user are collected as highlight scenes.
2. Description of the Related Art
As highlight scene detection technology for detecting highlight scenes, for example, from content, such as movie, television broadcasting programs, or the like, there are technology using the experience or knowledge of experts (or designers) experience or knowledge, technology using statistical learning using learning samples, and the like.
In the technology using the experience or knowledge of experts, a detector for detecting event that occurs in a highlight scene and a detector for detecting a scene (i.e. a scene that generates event) defined from the event are designed on the basis of the experience or knowledge of experts. Also, the highlight scene is detected using the detectors.
In the technology using the statistical learning using learning samples, a detector for detecting a highlight scene (i.e. highlight detector) and a detector for detecting an event that occurs in the highlight scene (i.e. event detector) are obtained using the learning samples. Also, the highlight scene is detected using the detectors.
Also, in the highlight scene detection technology, the feature amount of video or audio of content is extracted, and the highlight scene is detected using the feature amount. As the feature amount for detecting the highlight scene as described above, the feature amount is generally used which is specialized to a genre of content from which a highlight scene is to be detected.
For example, in the highlight scene detection technology, such as “Wang”, “Dua”, or the like, the high-order feature amount for detecting event “whistle”, “applause”, or the like, is extracted from a soccer game video using a soccer field line, a soccer ball trace, movement of the whole screen, MFCC (Mel-Frequency Cepstrum Coefficient), or the like, and detection of a soccer play scene, such as an attack, a foul, or the like, is performed using the feature amount obtained by combining the extracted feature amounts.
For example, “Wang” has proposed a highlight scene detection technology in which a view-type classifier using the color histogram feature amount, a play location identifier using a line detector, a replay logo detector, an announcer excitability detector, a whistle detector, or the like, is designed from video of a soccer game, and a soccer highlight detector is configured by modeling the temporal before-after relations among them by a Bayesian network.
In addition, as the highlight scene detection technology, a technology of detecting a highlight scene of content using a feature amount that specifies a high tone (a shout of joy) sound has been proposed in Japanese Unexamined Patent Application Publication No. 2008-185626.
According to the above-described highlight scene detection technology, the highlight scene (or event) can be detected with respect to the content of a specified genre, but it is difficult to detect a proper scene as the highlight scene with respect to the content of other genres.
For example, according to the highlight scene detection technology described in Japanese Unexamined Patent Application Publication No. 2008-185626, the highlight scene is detected under a rule that a scene with shouts of joy is considered as the highlight scene, but the genre of the content in which the scene with shouts of joy becomes the highlight scene is limited. Also, in the highlight scene detection technology described in Japanese Unexamined Patent Application Publication No. 2008-185626, it is difficult to detect the highlight scene with respect to the content of a genre in which a scene with no shouts of joy becomes the highlight scene.
Accordingly, in order to perform the highlight scene detection with respect to the content of a genre except for a specified genre using the highlight scene detection technology described in Japanese Unexamined Patent Application Publication No. 2008-185626, it may be necessary to design a feature amount suitable for the genre. Further, it is necessary to perform the rule design for the detection of the highlight scene (or definition of the event) using the feature amount on the basis of an interview with experts.
For example, in Japanese Unexamined Patent Application Publication No. 2000-299829, a method of detecting a highlight scene by designing the feature amount and a threshold value that can be used to detect a scene that generally becomes the highlight scene and performing a threshold value process using the feature amount and the threshold value has been proposed.
However, as the content has recently been diversified, it is very difficult to obtain a general rule, such as a rule for processing the feature amount or the threshold value, for detecting proper scenes as highlight scenes with respect to all the content.
Accordingly, in order to detect a proper scene as the highlight scene, it may be necessary to design the feature amount and the rule for detecting the highlight scene suitable for each genre. Even in the case of designing such a rule, it is still difficult to detect an exceptional highlight scene, which for example may be an exception to the rule.