1. Field of the Invention
The invention relates to speech recognition. More particularly, it relates to a method and apparatus for vocabulary independent wordspotting in speech.
2. Prior Art
In practical applications of wordspotting in speech recognition, such as, for example, audio-indexing, voice mail retrieval, spoken message retrieval and audio-browsing, it is necessary to have the ability to process large amounts of speech at speeds many times faster than real-time. The present wordspotting techniques are generally composed of three types:
1. A large vocabulary continuous speech recognizer which is used to produce N best transcriptions of the speech. From the N-best lists, the a posteriori probability of the word in question, is estimated. If this probability exceeds a user-defined threshold, the given word is deemed present. M Weintraub, "LVCSR Log-Likelihood Ratio Scoring For Keyword Spotting", ICASSP 1995, Vol 1, pp 297-300. PA1 2. Building a detailed acoustic model for background speech, and using it in parallel with a detailed model for the given word to compute the a posteriori probability of the word. If this probability exceeds a user-defined threshold, the given word is deemed present. J. R. Rohlicek, W. Russel, S. Roukos, H. Gish. "Word Spotting", ICASSP 1989, pp 627-630. PA1 3. Speech is pre-processed and stored as a phone lattice by running a modified Viterbi decoder on null-grammar phone network. The presence of a given word is determined by conducting a dynamic programming search on the phone lattice. D. A. James, S. J. Young. "A Fast Lattice-Based Approach To Vocabulary Independent Wordspotting", ICASSP 1994, pp 377-380.
All of the above methods have their shortcomings. In the first method, wordspotting essentially reduces to searching through text. Consequently, retrieval is fast. However, words that do not appear in the vocabulary of the speech recognizer cannot be spotted. The second method has no limitations on the words that can be searched for, but is very slow since it requires re-running the wordspotter every time a new word is specified. The third method has both the flexibility of being able to search for any word and speed of retrieval. However, it relies heavily on phone recognition accuracy which is often very poor.