Speech recognition is one type of voice technology that provides a way for people to interact verbally with a computer. Speech recognition is an especially challenging technology because of the inherent variations of speech among different persons. Two types of approaches to speech recognition have evolved: speaker dependent and speaker independent.
Speaker dependent speech recognition uses a computer that has been "trained" to respond to the manner in which a particular person speaks. In general, the training involves one person speaking a sound to generate an analog speech input, converting the speech input into signal data, generating a template representing the sound, and indexing the template to appropriate response data, such as a computer instruction to perform an action. During real time applications, input data is compared to the user's set of templates and the best match results in an appropriate response.
Speaker independent speech recognition uses a computer that stores a composite template or cluster of templates that represent the same sound spoken by a number of different persons. The templates are derived from numerous samples of signal data to represent a wide range of pronunciations. Also, during real time applications, the matching process is more difficult the computer must interact with persons for whom it is not trained, and must accommodate different accents and inflections.
One application of speech recognition is in telephone systems. People may communicate directly with computers to perform simple tasks that would otherwise be done manually or with operator intervention. For example, voice recognition can be used for dialing so that the user need not remember, look up, or ask for a telephone number. Also, the user need not use his or her hands.
Some telephone applications use independent speech recognition for dialing. These applications are practical when the vocabulary is limited, such as when the user will simply vocalize numbers or select a command from a menu. However, such systems do not permit the caller to identify the called party with an identifier that is common to more than one destination, such as "my home". In such situations, the caller must use a unique identifier, such as the number of the called party. Also, speaker independent processing is expensive in terms of processing overhead.
On the other hand, a speaker dependent speech recognition system can accommodate a variety of destinations only by training the system to recognize a set of telephone numbers to be called by each user. This requires separate resources for each user and is expensive in terms of the physical device requirements. Also, the training process is prone to human error.
A need exists for a voice recognition method for dialing that minimizes processing complexity as well as training requirements.