The present invention relates to a media content data processing device, a data processing method, a storage medium, and a program, all being related to viewing, playback, and delivery of continuous audio-visual data (media content), such as a motion picture, a video program, or an audio program, wherein the synopsis or a highlight scene of media content or only a scene of media content desired by the audience is played back and delivered.
Conventional media content has conventionally been played back, delivered, or stored on the basis of individual files storing media content.
As described in Japanese Patent Laid-Open No. Hei-10-111872, according to a method of retrieving a specific scene of a motion picture, a change between scenes of the motion picture (hereinafter referred to as a “scene cut”) is detected. To each scene cut are added additional data, such as a time code of the start frame, a time code of the end frame, and a keyword of the scene.
As an alternative method, Carnegie Mellon University (CMU) has attempted to summarize a motion picture by detecting scene cuts of a motion picture, detecting a human face or a caption, and detecting a key phrase through speech recognition.
When the motion picture is played back on a per-file basis, reviewing the synopsis of the motion picture has been impossible. Further, even when a highlight scene or scenes desired by the user are retrieved, the scene or scenes must be searched from the head of media content. Further, in the case of delivery of a motion picture, all the data sets of a file are transmitted, thus requiring a very long transmission time.
According to the method described in Japanese Patent Application Laid-open No. Hei-10-111872, scenes can be retrieved through use of a keyword, thus facilitating retrieval of scenes desired by the user. The additional data do not include a relationship or connection between the scenes. For this reason, the method encounters difficulty in retrieving, e.g., one subplot of a story. Further, when retrieving scenes based on only a keyword, the user encounters difficulty in gaining awareness of which scenes are contextually important. Therefore, preparation of a synopsis or highlight scenes becomes difficult.
The method developed by CMU enables summarization of a motion picture. However, summarization results in a digest of a single, fixed pattern. For this reason, summarization of a motion picture into a digest which requires a different playback time; for example, a digest whose playback time assumes a length of three or five minutes, is difficult. Further, summarization of a motion picture desired by the user; such as selection of scenes including a specific character, is also difficult.