1. Technical Field
The invention relates to techniques of performing extraction of scenes of interest from video or motion-picture content such as TV broadcast content or content from a video sharing website, from a user's viewpoint.
2. Description of the Related Art
In recent years, a variety of pieces of video content have been delivered to a large audience via TV broadcasting or the Internet. Such video content is any type of media content such as TV broadcast content or content from a video sharing website.
For example, in the case of TV broadcast content, the TV broadcast content contains not only audio/video information representative of a broadcast program, but also caption or subtitle teletext-information appearing in synchronization with the audio/video information. The caption teletext-information is typically a short text message of a summary of the broadcast program, and the message contains keywords that represent the audio/video information appropriately. The caption teletext-information is provided by a content provider.
In addition, a large audience is actively posting their comments to a website such as a blog (web log) or a mini blog (mini web log) (e.g., Twitter (Registered Trademark)) via the Internet. These comments are characterized in that they share the same topic for discussion. The shared topic includes video content delivered to the large audience as described above.
While many users are viewing video content, for example, they can discuss the video content that is being broadcast, via a mini blog or the like. In recent years, such a viewing habit has become more popular that, while viewing TV broadcast content (e.g., a drama), for example, users post their comments on the TV broadcast content, via a mini blog. This makes many users to feel that they are viewing one piece of video content as a shared content.
In addition, it is possible to extract keywords that interest many viewers of the same piece of TV broadcast content, from comments that those viewers have posted to a blog site, resulting in collection of comments. Those keywords are, for example, hash tags in the case of Twitter, for example.
It is noted that there is a conventional technique of detecting scenes of interest (peaks) based on the number of comments posted (i.e., the number of tweets, in the case of Twitter), and partitioning video content (see Non-patent Literature No. 1: David A. Shamma, Lyndon Kennedy, and Elizabeth F. Churchill, “Tweet the Debates,” Proc WSM '09, Oct. 23, 2009, for example) into segments. This technique allows estimation of the content of each one of scene segments, from its number of tweets.
This technique of Non-patent Literature No. 1 applies to a discussion event such as a debate. For example, a hypothetical event is considered in which, in video content, opinion of a first leader is followed by opinion of a second leader. In this event, the audience continues posting their first comments on the first leader's opinion even when the second leader is presenting her or his opinion. Non-patent Literature No. 1 is for correcting a temporal deviation (temporal difference) between when each scene of the video content appears and when the associated posts are counted.
From a different standpoint, there are also many cases where some viewers cannot view TV broadcast content, for example, in realtime, but they need to view only scenes of interest by scene extraction. Video content is content that is to be viewed in a time-sequential manner, and abbreviated content provided by extracting only scenes of interest from the video content is delivered by a content provider. In contrast, there is also a technique of extracting highlights using an image-feature-based approach (see Non-patent Literature No. 2: Alan Hanjalic, “Adaptive Extraction of Highlights From a Sport Video Based on Excitement Modeling,” IEEE Transactions on Multimedia, Vol. 7, No. 6, December 2005, for example). This technique allows highlights extraction by analysis of features of motion pictures themselves of a TV program.