This invention relates to recognition of voice messages and, in particular, to recognition of voice messages carried in pulse code modulated (PCM) form.
Voice recognizers and voice recognition systems and methods are known and have been used in a variety of applications. One application is in state of the art telephone systems which offer services based upon speech interaction between the telephone subscriber and the system.
In systems of this type, speech responses by the subscriber are used to directly invoke system operations which previously required key or dial entry. An example of such a service is speech invoked auto-dialing.
In this type of dialing, the subscriber is able to access a speech server coupled to the central office switch of the system. The speech server is, in turn, able to recognize telephone numbers to be dialed based upon speech entries by the subscriber. A recognized telephone number is then transmitted by the speech server to the central office switch. The switch then proceeds to interconnect the number as if it had been keyed in or dialed in conventional fashion by the subscriber.
In the above voice activated systems, there may be a number of interactions between the subscriber and the speech server. Thus, voice prompts generated by the speech server may be needed to invoke speech responses by the subscriber which must then be recognized by the server. An integral part of the server is the voice recognition equipment used to recognize the speech input of the subscriber.
Conventional speech recognizers have been proposed for this type of recognition. These recognizers operate on PCM digital signals which are formed from voice samples derived from the voice message.
In this type of application, a recognizer usually is required to recognize only a limited number of voice messages from each subscriber. The recognizer in most cases is initially trained based on repeated entries by the subscriber of the set of voice messages which are desired to be later recognized. PCM digital signals representing the samples of these voice messages are then processed in accordance with statistical algorithms or functions to develop so-called "templates" which are indicative of a given voice message. These templates are then stored for use during the recognition procedure.
During recognition, PCM digital signals representative of a voice message to be recognized are first formed. These signals are then applied to the voice recognizer which processes the signals following the same statistical algorithms or functions used during learning. This results in so-called "tokens" being developed from the PCM digital signals. These tokens are then compared with the stored templates and when a sufficient match is realized, the voice message is recognized as that indicated by the matched templates. This completes the recognition process.
While present day recognizers can successfully perform recognition in this way, they have certain limitations which detract from their overall usefulness, particularly their usefulness in the above-mentioned telephone system application. One limitation of these known recognizers is their perceived failure rate. Typical recognizers might be perceived to provide a wrong or invalid recognition a relatively high percentage of the time. This reduces user confidence, which is particularly undesirable in telephone system applications. Also, with present day recognizers, if a voice message is to be recorded as well as recognized, a separate recording system must be used in parallel with the recognizer. This increases the overall cost of the system, again making the system less attractive for telephone applications.
Finally, present day voice recognizers are susceptible to errors based upon changes in the amplitude of the incoming voice message. Thus, a voice message spoken at one level might be recognized, but the same voice message spoken at another amplitude level might not be recognized. This limitation makes the recognizers less satisfactory for telephone and other applications, where different voice levels may occur frequently.
It is therefore a primary object of the present invention to provide a voice recognition device and method which overcomes one or more disadvantages of the above prior recognition devices.
It is a further object of the present invention to provide a voice recognition device and method which has a lower perceived failure rate.
It is still another object of the present invention to provide a voice recognition device and method in which recognition errors due to changes in amplitude level of the voice messages are reduced.
It is also an object of the present invention to provide a voice recognition device and method in which recognition and recording of the voice message can be simultaneously carried out.