1. Field of the Invention
The present invention relates to an apparatus, a computer program product and a system for processing content information including video images and/or audio.
2. Description of the Related Art
In recent years, mainstream devices that record audio and video images have transitioned from conventional magnetic tapes in analog format to magnetic disks, semiconductor memories, and the like in digital format. Recording capacity has been increasing dramatically, particularly in video recording and reproducing devices using a large-capacity hard disk. Through use of such devices, video images of a large number of programs provided via broadcasts and transmissions can be stored. A user can select the programs as desired, and view and listen to the selected programs.
To manage the stored video images, after a file is created under a title (program), a name and other information can be attached. The title is a unit of a program or the like. When listing the video images, representative images (thumbnails), names, and the like can be arranged and displayed. A single program (title) can be divided into units called chapters (segments). Reproduction and editing can be performed by chapter units. Chapter names can be attached. The representative images (thumbnails) of the chapters can be displayed. As a result, a chapter including a desired scene can be selected from a chapter list, and the selected chapter can be reproduced. In addition, a playlist or the like can be created by selected chapters being arranged. Video recording (VR) mode of a digital versatile disk (DVD) is provided to regulate the above-described management methods.
A marker used to designate a section or a position within the program (title) includes reproduction time information corresponding with a temporal position at which a video image and audio content is reproduced. In addition to a chapter marker indicating a chapter division point, an edit marker and an index marker may be used, depending on a device. The edit marker designates a subject section to which an editing operation is performed. The index marker designates a jump destination point when a cueing operation is performed. “Marker” in the present specification refers to the above.
If program information provided by an electronic program guide (EPG) or the like is used, a program name can be automatically attached to a file of a video image that has been recorded and stored. Association of Radio Industries and Businesses (ARIB) standard (STD-B10) is the program information provided by the EPG.
Within the single program, various data can be considered as metadata that effectively supports and automates viewing, editing, and the like. The various data includes information providing a temporal position at which the program is divided, names facilitating identification of each divided section, and the like. However, the metadata are rarely provided from an external source for general use. Therefore, in devices to be used by general viewers, the metadata is required to be generated by the device-end, based on recorded audio and video images.
Moving picture experts group (MPEG)-7 is a description format of the metadata for general use relating to video image and audio contents. There is a method in which the metadata are correlated with the contents and stored in an extensible markup language (XML) database. ARIB standard (STD-B38) is a system for transmitting the metadata during broadcasting and the like. The metadata can be recorded in compliance with the standard.
At the same time, as a method for retrieving video images or the like stored in a video-image recording/reproducing device, a method is known in which retrieval is performed using a feature quantity as a retrieval key. The feature quantity indicates an information characteristic extracted from information of the video image or the like. For example, in JP-A 2001-134613 (KOKAI), a following sound retrieval technology is proposed. In the sound retrieval technology, a user designates a section by listening to or viewing a sound signal or a video image signal of an extraction source from which the feature quantity has been extracted. The user registers the extracted sound feature quantity in a retrieving unit as the retrieval key. As a result, a matching or similar sound is retrieved.
In JP-A 2002-44610 (KOKAI), a technology is proposed in which a similar image is retrieved by the retrieval key being generated using the feature quantity of an image. Based on results of a sound retrieval or a video image retrieval, such as that described above, for example, similar sound contents or similar video image contents can be displayed, similar areas within the content can be displayed, structuring can be performed, and the metadata can be added.
An ordinary encoding can be performed on information in the retrieval key used in such retrievals so that an original sound or video image can be decoded to a listenable or viewable degree. The information can be, for example, pulse code modulation (PCM) data, MPEG data, or joint photographic experts group (JPEG) data.
However, ordinarily, the retrieval is required to be performed at a high-speed or index information size is required to be reduced. Therefore, a retrieval key having a significantly reduced amount of information is used. In other words, image information in a block unit so large that the image information is visually meaningless, information in which a plurality of feature quantities are inseparably combined, information encoded using rough quantization, information using a partial characteristic of a sound, such as a number of zero crossings, information in which a feature quantity during a fixed period of time is shown in a histogram, and the like are used. The original sound, video image, or image cannot be directly decoded from the feature quantities used in retrievals.
When the feature quantity of an audio or a video image is used as the retrieval key when retrieving the video image, the audio, or the like, it is generally presumed that the retrieval key is used within the device that generated the retrieval key. At the same time, if the video image or the like can be retrieved using a retrieval key generated in another device, user convenience is enhanced. For example, a reduction in a load attributed to retrieval key generation can be achieved. In this case, it is preferable that the video image or the audio that is the extraction source of the feature quantity can be confirmed to allow selection of a suitable retrieval key and the like.
However, for example, when a corresponding video image or audio is provided with the retrieval key to another device via communication or the like, in addition to copyright problems, a problem occurs in that an amount of communication increases as a result of transmission and reception of video image data and audio data generally having large amounts of information.