The present invention relates to speech recognition systems and, more particularly, to apparatus and methods for performing fast word acceptance or rejection using decoding history caches.
Speech recognition is an emerging technology. More and more often it is replacing classical data entry or order taking, which typically require filling out of forms, typing or interacting with human operators. Typically an initial step in a computerized speech recognition system involves the computation of a set of acoustic features (feature vector) from sampled speech. The sampled speech may be provided by a user of the system via an audio-to-electrical transducer, such as a microphone, and converted from an analog representation to a digital representation before sampling. Typically, a classical acoustic front-end (processor) is employed to compute the acoustic features from the sampled speech. The acoustic features are then submitted to a speech recognition engine where the utterances are recognized thereby generating a decoded or recognized script which is representative of the sampled input speech.
Classical speech recognition systems typically compare the likelihood of all possible word hypotheses or sequences of word hypotheses and select the most probable hypotheses as the recognized script based on acoustic and language modeling scores. This process is referred to as a detailed match search. When a comparison of all possible hypotheses is impractical, which is often the case, the set of possible hypotheses compared is limited by a process known as the fast match search which is performed to rapidly limit the set of possible hypotheses by eliminating, after a quick scoring, hypotheses falling too far behind the top ranking hypotheses.
Unfortunately, for high volume speech recognition applications, for example, a corporate name voice dialer, the amount of hypotheses to consider for known detailed match and fast match searches is still prohibitively large.