The technology disclosed relates to a system and method for fast, accurate and parallelizable speech search, a so-called “Crystal Decoder”. It is particularly useful for search applications, as opposed to dictation. It can achieve both speed and accuracy, without sacrificing one for the other. It can search different variations of records in a reference database without a significant increase in elapsed processing time. Even the main decoding part can be parallelized as the number of words increase to maintain a fast response time.
Speech interfaces have been an area of research for many years. A number of speech recognition engines have been created that work well. However, these systems have been optimized for dictation or automatic transcription. Their main task is to convert a speech waveform to a text transcription. Knowledge of context and use of statistical language models usually help with the accuracy.
There have been many attempts to apply transcription engines to the area of speech search. In such efforts, speech is first converted to text, and then text is sent to the search engine to retrieve the results. This system suffers from a number of weaknesses, mainly because search and dictation have their own unique challenges and a system that is designed for dictation is not necessarily optimized for search. For example, not knowing the context of the search engine can reduce the accuracy of the transcription stage. Then the error in the transcription will reduce the accuracy of the search engine. Another major problem is that search engines usually have a large number of words, which makes the decoder slow and inaccurate. In order to maintain high speed, the decoder then performs pruning which introduces additional error.
An opportunity arises to deliver components of a catalog search engine that responds to utterance of search requests. These components can be used separately or in combination. Better, readily parallelized and versatile voice analysis systems may result.