An emerging area of technology involving terminal devices, such a handheld devices, Mobile Phone, Laptops, PDAs, Internet Appliances, desktop computers, or suitable devices, is the application of information transfer in a plurality of input and output formats. Typically resident on the terminal device is an input system allowing a user to enter information, such as specific information request. For example, a user may use the terminal device to access a weather database to obtain weather information for a specific city. Typically, the user enters a voice command asking for weather information for a specific location, such as “Weather in Chicago.” Due to processing limitations associated with the terminal device, the voice command may be forwarded to a network element via a communication link, wherein the network element is one of a plurality of network elements within a network. The network element contains a speech recognition engine that recognizes the voice command and then executes and retrieves the user-requested information. Moreover, the speech recognition engine may be disposed within the network and operably coupled to the network element instead of being resident within the network element, such that the speech recognition engine may be accessed by multiple network elements.
With the advancement of wireless technology, there has been an increase in user applications for wireless devices. Many of these devices have become more interactive, providing the user the ability to enter command requests, and access information. Concurrently, with the advancement of wireless technology, there has also been an increase in the forms a user may submit a specific information request. Typically, a user can enter a command request via a keypad wherein the terminal device encodes the input and provides it to the network element. A common example of this system is a telephone banking system where a user enters an account number and personal identification number (PIN) to access account information. The terminal device or a network element, upon receiving input via the keypad, converts the input to a dual tone multi-frequency signal (DTMF) and provides the DTMF signal to the banking server.
Furthermore, a user may enter a command, such as an information request, using a voice input. Even with improvements in speech recognition technology, there are numerous processing and memory storage requirements that limit speech recognition abilities within the terminal device. Typically, a speech recognition engine includes a library of speech models with which to match input speech commands. For reliable speech recognition, often times a large library is required, thereby requiring a significant amount of memory. Moreover, as speech recognition capabilities increase, power consumption requirements also increase, thereby shorting the life span of a terminal device battery.
The terminal speech recognition engine may be an adaptive system. The speech recognition engine, while having a smaller library of recognized commands, is more adaptive and able to understand the user's distinctive speech pattern, such as tone, inflection, accent, etc. Therefore, the limited speech recognition library within the terminal is offset by a higher degree of probability of correct voice recognition. This system is typically limited to only the most common voice commands, such as programmed voice activated dialing features where a user speaks a name and the system automatically dials the associated number, previously programmed into the terminal.
Another method for voice recognition is providing a full voice command to the network element. The network speech recognition engine may provide an increase in speech recognition efficiency due to the large amount of available memory and reduced concerns regarding power consumption requirements. Although, on a network element, the speech recognition engine must be accessible by multiple users who access the multiple network elements, therefore a network speech recognition engine is limited by not being able to recognize distinctive speech patterns, such as an accent, etc. As such, network speech recognition engines may provide a larger vocabulary of voice recognized commands, but at a lower probability of proper recognition, due to inherent limitations in individual user speech patterns.
Also, recent developments provide for multi-level distributed speech recognition where a terminal device attempts to recognize a voice command, and if not recognized within the terminal, the voice command is encoded and provided to a network speech recognition engine for a second speech recognition attempt. U.S. Pat. No. 6,185,535 B1 issued to Hedin et al., discloses a system and method for voice control of a user interface to service applications. This system provides step-wise speech recognition where the at least one network speech recognition engine is only utilized if the terminal device cannot recognize the voice command. U.S. Pat. No. 6,185,535 only provides a single level of assurance that the audio command is correctly recognized, either from the terminal speech recognition engine or the network speech recognition engine.
As such, there is a need for improved communication devices that employ speech recognition engines.