1. Field of the Invention
The present invention relates to a device, method, and computer program product for structuring digital-content programs.
2. Description of the Related Art
In accordance with the recent widespread use of broadband and the like, the amount of digital content distribution has been increasing. Techniques for efficiently managing and processing the increasing amount of digital content on a computer have been considered, with which metadata is added to the digital content.
When the digital content is video, for example, a desired scene can be readily located or searched for if metadata indicating “the beginning of a subsequent scene” is attached to the time series. This improves the convenience of users. In general, video content is divided in advance into chapters by the content provider by use of metadata such as delimiting information used in a movie to divide into scenes. However, it is burdensome for the content provider to accurately add metadata to the entire content.
Recently, individual users (viewers) of an HDD recorder equipped with a play-list creating function create a play list by adding metadata to the time series of the video content. JP-A 2004-193871 (KOKAI) teaches a technique of adding metadata by a user. According to this technique, metadata created by an individual user (viewer) is placed to the public so that it can be shared by multiple users (viewers).
According to JP-A 2004-193871 (KOKAI), however, because metadata created by different users (viewers) are shared, the metadata may not always provide accurate chapter divisions for the content.
On the other hand, instead of the content provider or user dividing the content into chapters, it has been suggested that metadata is extracted automatically from the information of the content itself to achieve chapter division. The following methods are suggested:
(1) A method of extracting metadata from audio information of the video content;
(2) A method of extracting metadata from text information such as subtitles extracted from the video content or from text information included in the script of the video; and
(3) A method of extracting metadata from image information such as camera-switching information extracted from the video content.
There are some problems yet to be solved in those methods of automatically extracting metadata from the information of the content itself.
First, when audio information in the video content is used, an abstract scene such as “sensational” can be extracted based on the loudness of cheers, or a roughly divided scene can be extracted based on a discriminative keyword. At present, however, the voice recognition technology is not accurate enough to extract a precisely divided scene. There is also a problem that information of a scene cannot be extracted during a silent interval.
Secondly, when the text information of the video content is used, a scene can be extracted by estimating the topic as tracing changes of words that appear. There is a problem, however, that this method is not applicable to a content that does not contain text information such as subtitles and scripts. Although text information may be added to the content for the purpose of scene extraction, it is more efficient to add scene information as metadata of the content at the beginning than to add text information only for scene extraction.
Thirdly, when camera-switching information of the video content is used, such information suggests extremely primitive intervals. The camera-switching information therefore cuts the content into too small segments. If the content is a quiz show or news program, where typical sequences are included in accordance with the camera-switching information, scenes of appropriate sizes can be extracted by suitably grouping the sequences. This technique is not applicable to all the digital-content programs, however. If scenes are divided into chapters of inappropriate sizes, the convenience of users may be reduced.
More specifically, there are problems such as follows:
                If a scene is divided into too large chapters, the user may need to fast-forward the data to locate a desired scene, or may skip the desired scene under a skip operation.        On the other hand, if a scene is divided into too small chapters, the skip operation has to be repeated many times to reach the desired scene.        
In addition, even when the same content is dealt with, the size of scenes differs from user to user, depending on the viewpoint of the user watching the content. Thus, it is difficult to decide an appropriate size of chapters into which a scene is divided.