Sound files, like images, may be indexed by their titles. Unfortunately, if a sound file is simply an embedded or linked audio file on a Web page, there may be no additional information about it. The audio files may have some descriptive information included, such as the source. Other metadata can be included in audio files, but such inclusion requires more effort on the part of the content producer and, as in the case of images, the metadata may be incomplete or insufficient.
To fully index the content of audio files generally requires having a transcript of the session in a computer-readable text format that enables text-indexing. With voice recognition software, some automated indexing of audio files is possible and has been successfully used. However, it is widely known that such transcripts rarely match what was spoken exactly. The difficulty is compounded if the spoken words are sung and the search is for the song in a specific tune, or a search for a tune regardless of the words.
Analysis of audio signals is desirable for a wide variety of reasons such as speaker recognition, voice command recognition, dictation, instrument or song identification, and the like. In some instances, it may be desirable to convert human speech from one language to one or more other languages in real-time or at a later time. Particularly, a user listening to an audio signal may wish to hear the contents of the file in another language. Currently real-time speech translation is largely performed by human translators, as any machine-based translation algorithm does not provide reliable results.
It would be therefore advantageous to provide a solution that would overcome the challenges noted above.