The present embodiments relate to speech recognition, and are more particularly directed to a system using shared speech models to improve efficiencies over prior speech recognition systems.
Over the past decade, speech recognition in computers has dramatically improved. This improvement has led to various applications of speech recognition, such as in telephony operations. Voice activated dialing (VAD) is an example of speech recognition in telephony operations. In VAD, a computer maintains a directory which includes numbers frequently called by a caller, and for each such number the directory further includes one or more representations of a caller's voice speaking a name corresponding to the telephone number. A caller may then call a number identified in the directory by merely speaking the corresponding identifier or name into the phone, assuming that the spoken utterance matches an entry in the directory. For example, a call can be placed by saying "call the boss" or some other utterance into the phone microphone, in response to which the phone system will dial the corresponding number for the uttered name. As another example, the caller may invoke some type of telephony operation by speaking words which specify that operation, such as by stating "call waiting" or "call forwarding." Numerous other examples of speech recognition, both within and outside of telephony, will be appreciated by one skilled in the art.
Considerations in the design and implementation of speech recognition often depend on the type of speech modeling at issue. Two types of speech modeling are speaker dependent modeling and speaker independent modeling. Speaker dependent modeling is based on a fixed vocabulary from a single speaker and includes templates based on what the system expects to receive as voice signals. Additionally, speaker dependent recognition is not designed to provide secure access like speaker verification, and generally allows 50% or higher impostor acceptance. An example of such speaker dependent modeling in connection with VAD may be found in U.S. patent application Ser. No. 60/064,204 (Attorney docket number DSCC.615-00), entitled "System For Enhanced Spoken Name Dialing," filed Nov. 4, 1997, having the same inventors as the present document, and which is hereby incorporated herein by reference. In contrast, speaker independent modeling is not tightly constrained to a single speaker, and operates to identify a speech pattern, independent of the speaker, based on modeling which typically is derived from hundreds if not thousands of speech samples. This approach, therefore, necessarily gives rise to an amount of modeling data which is relatively large as compared to that of speaker dependent modeling.
By way of further background to the present embodiments, for either speaker dependent or speaker independent modeling there must be considerations of efficiency in implementing the overall system which operates per the modeling. Specifically, the type of model chosen or implemented provides a corresponding burden on items such as storage (e.g., memory size) and processing capability (e.g., speed or number of processors). Under the current state of the art for speaker dependent analyses, a typical single digital signal processor ("DSP") has been found to accomplish a relatively small number (e.g., one or two) of speaker dependent processes at a time. Such a "process" is generally a task to review one utterance during a period of time and analyze it according to the appropriate speaker dependent (and possibly other) model(s). Additionally, implementing a speaker independent process as opposed to a speaker dependent process increases the model size complexity, where the increase may be at least an order of magnitude in the case of phonetic speaker independent models. Thus, this increase if not adequately and efficiently considered may pose a strict or severe limitation on overall system operation.
Given the above, one skilled in the art will appreciate that there arises a need to provide a system for adequately and efficiently implementing speech recognition while considering the demands imposed by a particular modeling technique. One approach in view of the above-described considerations of phonetic speaker independent modeling may be to increase memory size. However, such an approach necessarily increases overall cost and, therefore, may not be acceptable given other considerations. Instead, the preferred embodiment described below sets forth an alternative approach which avoids such a corresponding cost increase and therefore improves the options available in contrast to the current state of the art.