Several patents and patent applications deal with audio-indexing and searching of audio data, e.g., U.S. Pat. No. 5,649,060 issued to Ellozy et al. on Jul. 15, 1997; U.S. Pat. No. 5,794,249 issued to Orsolini et al. on Aug. 11, 1998; and U.S. patent application identified by Ser. No. 09/108,544, entitled: “Audio-Video Archive and Method for Automatic Indexing and Searching,” filed on Jul. 1, 1998, the disclosures of which are incorporated by reference herein. All of the approaches taken in these patents and the patent application use a word as a basic unit for indexing and search. Typically in these methods, audio data is transcribed (via automatic speech recognition or manually), time stamped and indexed via words.
In a word-based system, before the searching can be started, a vocabulary and a language model based on known words must be prepared. Thus, by definition, there are always words that are unknown to the system. Unfortunately, the searching mechanism can only work with words resulting in a good language model score, i.e., known words.
In an attempt to create a system capable of searching using an entry which is unknown to the system, phone-based indexing methods have been proposed. This method includes generating an acoustic transcription for words and indexing speech segments via acoustic phones. However, these phone-based indexing methods are not very efficient since there can be different phonetic descriptions for the same word and the phonetic recognition accuracy can be low, e.g., lower than a word recognition accuracy.
These difficulties are even more apparent in a system operating in a language for which the unit “word” in speech and text may be ambiguous, e.g., the Chinese language, or in a language that has a very large number of word forms, e.g., Slavic languages.
For most European languages, word boundaries exist in printed text, as well as in computer text files. These boundaries are represented as blank spaces between words. However, for most of the Asian languages, including, e.g., Chinese, Japanese, Korean, Thai, and Vietnamese, such word boundaries neither exist in printed form, nor in computer text files. Thus, word-based indexing and searching methods can not be applied to these languages. Phone-based indexing and searching methods for these languages have similar problems as those mentioned above.
Thus, a need exists for methods and apparatus for indexing and searching audio data, and the like, which minimizes and/or eliminates these and other deficiencies and limitations, and which may be used with a greater number of languages.