There is a current trend for video and speech to be recorded as digital data and for the thus prepared recordings to be widely distributed as multimedia (digital) contents. To process such multimedia (digital) contents (hereinafter referred to simply as contents), various devices have been proposed that capitalize on the ease with which digital data can be copied and edited. One such device involves the use of metadata, elements of which are correlated along the time axis of a recording with video and speech contents, employed to provide descriptions, to explain the contents, and how contents are formatted.
Metadata elements are used to describe the locations of correlated contents, and to provide information concerning data structures and data conversion and contents characteristics and relative definitions. As is shown in FIG. 10, metadata elements for video contents can be written for individual scenes. In FIG. 10, XML (Extensible Markup Language) is used to write the metadata elements, including scene tags, titles, scene start times and end times, and information linked to individual scenes.
That is, for the contents in FIG. 10, correlated metadata elements are provided for the video contents and text that provide a variety of information, including the start and the end times of scenes. Then, when a player (a video and speech reproduction apparatus) is used to interpret contents for which metadata elements are provided, the metadata enables it to process specific scenes based on the included contents.
For example, as is described above, since the start time and the end time for each scene in the contents are provided by an accompanying metadata entry, index information for each of the scenes can be generated by referring to the correlated metadata for the scene. The index information is presented for a user and accompanies the display of an image of the text of the title for the scene or the first image of the scene, thereby permitting a user to employ the index information to generate a summary by reproducing or deleting an arbitrary scene.
The methods for providing metadata corresponding to contents include a method for distributing contents and metadata together, and a method for adding, to contents, a pointer to the metadata and address information for a site whereat the metadata are stored, and for obtaining from contents corresponding metadata. When the contents and the metadata are integrally assembled, a user can obtain the contents and the metadata at the same time; however, updating only the metadata without changing the contents is difficult. Therefore, it is more convenient for the contents and the metadata to be managed separately, so that a user who obtains contents can refer to desired metadata by using pointer and address information (hereinafter this information is generally referred to as pointer information).
However, according to the conventional method for correlating contents with metadata using time codes, i.e., the start time and the end time of a scene, the correlation of contents and metadata will be destroyed when the contents are edited. This problem arises because, timing information included with metadata includes timing for contents that has not yet been edited, so the metadata does not correspond to contents when the timing for a scene is changed by editing the contents.
Therefore, when contents are distributed for which correlation with timing information in the metadata is destroyed due to editing, a user who obtains the edited contents can not generate appropriate index information by using the metadata for the contents. The user can not perform a process, such as a search or the generation of a summary, based on the edited contents. Further, when only contents are distributed first and edited, pointer information for metadata added to the contents are lost through editing. Therefore appropriate metadata for the contents will not be obtained.