Sound and image files, as well as other files featuring multimedia content, may be indexed by their titles. Unfortunately, if a multimedia file is simply an embedded or linked multimedia file on a Web page, there may be no additional information about it. The multimedia files may have some descriptive information included, such as the source. Other metadata can be included in multimedia files, but such inclusion requires more effort on the part of the content producer and, as in the case of images, this may be incomplete or insufficient, to say the least.
Full indexing of the content of sound files generally requires having a transcript of the session in a computer-readable text format to enable text-indexing. With voice recognition software, some automated indexing of audio files is possible and has been successfully used. However, it is widely known that such transcripts rarely match what was spoken exactly. The difficulty is compounded if the spoken words are sung and the search is for the song in a specific tune, or a search for a tune regardless of the words. Analysis of audio signals is desirable for a wide variety of reasons such as speaker recognition, voice command recognition, dictation, instrument or song identification, and the like.
Similarly, video analysis is a growing field alongside image recognition. One application within the field of video analysis is performing a search on a plurality of videos, thereby enabling a user to find a video containing a specific scene or action that the user wishes to view. For example, a user may wish to see a video of a person slipping on a banana peel. However, existing solutions typically only permit a user to find such video content if the video is associated with metadata identifying its content. Metadata associated with the video clips typically describe attributes of the clip, such as length, format type, source and so on. The metadata does not describe the contents of the clip and in particular the contents of each scene.
It would therefore be advantageous to have a system capable of identifying multimedia content elements according to the content contained therein.