In recent years, the dominating equipment for recording audio and video is shifted from a conventional analog magnetic tape to a digital magnetic disk, semiconductor memory or the like. Especially in a video recording and reproducing equipment using a large capacity hard disk, the recordable capacity is remarkably increased. When such an equipment is used, videos of many programs provided by broadcast or communication are stored, and the user can freely select and view them.
Here, in the management of the stored videos, files are formed using titles (programs) as units of programs or the like, names and other information are given, and when they are listed, typical images (thumbnails), names and the like are arranged and can be displayed. Besides, one program (title) is divided into units called chapters (segments), and reproduction and editing can also be performed in chapter units. When chapter names are given, and typical images (thumbnails) of chapters are displayed, a chapter including a favorite scene can be selected and reproduced from a chapter list, or selected chapters can be arranged to create a play list or the like. As regulations on management methods of these, there is a VR (Video Recording) mode of DVD (Digital Versatile Disc).
Incidentally, a marker used for specification of a section and a position in a program (title) includes reproduction time information corresponding to a time position at a time when video and audio content is reproduced, and in addition to a chapter marker expressing a chapter division point, according to a device, there is also a case where an edit marker to specify an object section at an editing operation, or an index marker to specify a point of jump destination at a cue operation is used. Incidentally, the ┌marker┘ in the present specification is also used in the above meaning.
With respect to a program name, when program information provided by EPG (Electronic Program Guide) and the like is used, it can be automatically given to a recorded and stored file. With respect to the program information provided by the EPG, there is ARIB (Association of Radio Industries and Businesses) standard (STD-B10).
However, with respect to the inside of one program, although various contrivances, such as information to give a division time position and a name to enable easy identification of each of divided parts, are conceivable as metadata useful in supporting viewing, editing and the like and in performing automation, these are hardly general-purposely provided from the outside. Thus, in an equipment for a general viewer, it is necessary for an apparatus side to create metadata based on the recorded audio and video.
As a general-purpose description format of metadata relating to video and audio content, there is MPEG-7, and there is a method in which metadata is made to correspond to content and is stored in XML (extensible Markup Language) database. Besides, with respect to a transmission system of metadata in broadcasting, there is ARIB (Association of Radio Industries and Businesses) standard (STD-B38), and the metadata can also be recorded in accordance with these.
As what is automatically performed by an apparatus, there is also a case in which a chapter division function by detection of a silent portion, switching (cut) of video, switching of audio-multiplex mode (mono, stereo, dual mono for bilingual broadcast) is provided (see, for example, patent document 1 (JP-A-2003-36653)). However, the division is not necessarily suitably performed, and the user must considerably perform manual operation including the giving of a significance to each of the divided chapters and the giving of a name.
Besides, with respect to metadata creation of automatic keyword extraction or the like using language information obtained by telop image recognition and speech recognition, the use in full-text retrieval becomes possible (see, for example, patent document 2 (JP-A-8-249343)). However, with respect to the portions such as the chapter division and the giving of a name, the whole application is difficult under the present circumstances.
On the other hand, although methods of acoustic retrieval or audio robust matching to retrieve the coincidence or similarity of sounds have been conceived, most of them are used in such a form that a music or the like whose viewing and listening is desired is retrieved and reproduced, and the structure is not suitable for metadata creation of video (see, for example, patent document 3 (JP-A-2000-312343)).
As stated above, in the related art, in the management of a large amount of stored video, especially in the division of one program, there has been a problem that it is impossible to easily perform the division suitable for viewing and listening, the determination of control points and the giving of relevant information.
Then, the present invention has been made in view of the above circumstances, and has an object to provide a data processing apparatus in which with respect to video to be recorded and stored, division suitable for viewing and listening, the determination of control points, and the giving of relevant information can be performed without requiring a manual operation each time.