Recently, devices and methods for recognizing continuously spoken speech automatically have become more and more important. There are indeed wide areas of services, such as information services, customer support or the like, in which a substantial amount of personal related costs could be avoided by utilizing devices which respond automatically to the customer's inquiries.
The most important condition which must be fulfilled by apparatuses and methods for automatic speech recognition is that these apparatuses and methods have to reliably recognize and understand the speech input given by the customer independently from the particular speaking conditions, such as speaking velocity, voice intonation, articulation, background noise or the like.
There are lots of devices, such as automatical telephone services, time schedule information services or the like, which work in a reliable manner only when applied in a well-defined and narrow area of all possible utterances made by the customer. These methods and devices are generally designed to manage a very narrow scope of vocabulary and vocal situations only.
In the field of large vocabulary speech recognition most methods and devices work as follows:
Upon receipt of a speech phrase a signal is generated which is representative for the received speech phrase. The signal is then pre-processed with respect to a predetermined set of rules which may include digitizing, Fourier-analyzing and like signal evaluation techniques. The result of pre-processing the signal is stored.
On the basis of the pre-processed signal at least one series of hypothetic speech elements is generated which serves as a basis for the determination of at least one series of words being a probable candidate to correspond to said received speech phrase. For the determination of the series of words a predefined language model has to be applied in particular to at least said series of hypothetic speech elements.
One major drawback of conventional methods and devices for large vocabulary speech recognition is the large complexity and the large number of possible candidates of speech fragments or elements to be searched for and to be tested. Without limiting the scope of subject-matter and therefore the scope of vocabulary, all possible candidates for speech elements or speech fragments have to be evaluated by distinct searching techniques.