The present invention relates to speech/speaker recognition and telephone mail messaging and, more particularly, to apparatuses and methods for improved digit recognition and/or caller identification utilizing speech/speaker recognition in telephone mail messaging.
Typically, in a telephone system having a voice mail feature, a caller leaves a telephone message which may include his name, telephone number and/or a brief request or message on a receiving party's voice mail equipment. As is known, the telephone number that is left usually informs the voice mail user where the caller may be reached over the telephone. Conventional automatic speech recognition (ASR) decoding may provide the user with a decoded text representation of the phone message. However, an error in decoding even one digit of the telephone number of the caller can make an entire telephone message useless since a user may not be able to return a call (unless, of course, the user plays back a recorded representation of the phone message).
There exist telephone devices (and services) that allow a receiving party to trace back or record a telephone number of a telephone set from which a caller placed a call. However, this is not always useful since a caller may have called from some temporal location (e.g., a street phone) or may have left a telephone number to call back that is different from a telephone number at his current location. Furthermore, user identification alone does not help to identify the phone number to call back since the caller may have many phone numbers where he can be reached, e.g., home, office, hotels during his travels, etc.
In addition to voice mail messaging systems, a fast growing area in the consumer communications market is text-independent speaker recognition as disclosed in U.S Ser. No. 08/788,471 filed on Jan. 28, 1997, entitled: "Text-independent Speaker Recognition for Command Disambiguity and Continuous Access Control". It is known that a problem with text-independent speaker recognition is that a textual context, in general, is difficult to use to improve the accuracy of speaker recognition. Also, with regard to telephone applications, since the bandwidth associated with a typical telephone line may tend to reduce the accuracy associated with ASR, telephone continuous speaker-independent recognition decoding has been considered to be a challenging task. Especially with the additional difficulties of microphone mismatch (e.g., speaker phone, cellular phones, carbon and/or electric microphones) and channel variability (e.g., from one phone call to another, the path through the telephone network can vary dramatically, which in turn has a severe effect on the distortions and signature introduced by the channel).
It would be highly desirable and advantageous to provide apparatuses and methods which overcome the drawbacks and limitations described above with respect to ASR decoding of telephone voice mail messaging as well as telephone continuous speaker-independent recognition decoding.