As a means of automatically imparting an index in a broadcast program, generally adopted is the means of: identifying a program performer or a speaker in a specific scene by using a face image recognition technology, a telop character recognition technology, a voice recognition technology, or a speaker recognition technology; transforming the result into metadata; and imparting an index. For example, as disclosed in JP-A No. 167583/1999, there is a means of recognizing a character string of a telop and thereby imparting the result to a picture as metadata. Here, the term “metadata” means data describing identifiers, names, performers, subjects and others in the contents of a broadcast program and the like.
In the case of a face image recognition technology, the recognition rate has varied in response to a screen position, a background picture, simultaneous display of plural objects and the like, and it has been difficult to obtain metadata of a high degree of accuracy. In the case of a telop character recognition technology too, a display position of telop characters, a background picture, a character font and the like have been restricted, and it has been difficult to realize the recognition of high probability which is not affected by environments. Further, in the cases of a voice recognition technology and a speaker recognition technology too, false recognition has occurred frequently in a program wherein plural speakers appear in a mixed manner or an indefinite number of speakers appear frequently, and the load of manual reconditioning work has been required likewise.
In consequence, when one of the above technologies is used, manual intervention (reconditioning work) has been necessary in order to impart accurate metadata and the reality has been far from the automatic impartment of metadata which can minimize manual reconditioning work.