A problem with multimedia data formed of a sequence of digitized images lies in automatically recognizing what is represented in the individual images so that the multimedia data can be annotated based on the content. The aim in particular is to annotate the multimedia data such as to indicate what persons (which is to say their identity) are represented therein. Annotating can relate also to a context associated with the objects represented such as, for example, when the multimedia data was produced, in what connection it was produced, and whether inside or outside recordings are shown, etc.
In this connection it is known how first to disassemble the multimedia data into individual scenes, meaning into images that are logically or semantically connected. Persons are detected automatically by face-recognition software. Distinguishing between persons is also made possible by analyzing their speech so they can be differentiated on the basis of different speech profiles. Persons who have been recognized are usually identified through a comparison with information stored in a database in which reference information concerning phonetics and/or visual features for respective persons has been filed. If, though, a person represented in the multimedia data is not in the database file, then identifying in that way will not be possible.
Other methods employ an internet search for annotating the multimedia data. That entails first searching for manually annotated images for persons requiring to be annotated. The characteristics ascertained for the images that were found are then compared with those of the persons represented in the multimedia data. If they tally, the multimedia data can be annotated with a specific person by transferring the manual annotation from the image.
The processes known from the related art require manual interventions in practically all cases so that the annotating of multimedia data cannot be automated.