1. Field of Invention
This invention relates to a method and apparatus for automatic speech recognition.
2. Description of Related Art
Mobile device usage has increased as mobile devices can store more information and more information can be accessed over networks. However, conventional input methods for mobile devices such as web-enabled phones, personal communication systems, handheld personal digital assistants and other mobile devices is limited. For example, the size of keyboards on mobile devices is limited due to the need to make the mobile device as small and compact as possible.
Conventional limited size keyboards typically use multi-functions keys to further reduce size and space requirements. Multi-function keys are keys that depend on the selection of previous key sequences. Multi-function keys can be used to perform many different functions. However, when the number of additional functions increases, multi-function keyboards become difficult to use and the input method becomes error prone. Decreasing the size of keyboards with multi-function keys further increases the likelihood of mis-keying due to the smaller key size. Thus, decreased size multifunction keys are also error prone and difficult to use. Some manufacturers have attempted to address these problems with the use of predictive text entry input methods. For example, the T-9® predictive text entry system used in many web-enabled phones attempts to predict complete words as the keystrokes for each word are entered. However, the T-9® predictive text entry system mis-identifies words, is not easily adapted to words in different languages and requires the use of a keyboard and are not easy to use.
Some manufacturers of mobile devices have attempted to address keyboard input problems by increasing the size of the mobile device keyboard. For example, the Ericsson model R380 and R380s web-enabled phones are equipped with a flip-up keypad that reveals a larger touch sensitive screen for input functions. However, these touch sensitive screens are expensive, increase the likelihood of damage to the device, increase power requirements and therefore battery size and fail to provide the user with an input method that is easy to use.
Some personal digital assistant device manufacturers such as Palm and Handspring have attempted to address these limitations of conventional input methods by adding handwriting recognition software to their mobile devices such as personal digital assistants. However, handwriting recognition software is also error prone, requires that the user be trained to write in ways easily recognizable by the handwriting recognition software and fails to provide an input method that is easy to use.
Automatic speech recognition provides an easy to use input method for mobile devices. However, conventional speech recognition systems for mobile devices provide speech recognition on a specific device and require intervention by a user such as training. If the user must replace a lost or damaged device with a new device, the new device must be retrained before use or the accuracy of the device is lessened. Also as the user's usage environment deviates from the training environment, the accuracy of the voice recognition will be affected.
Other conventional speech recognition systems use speaker independent models either in the device or in the network. However, these conventional speaker independent speech recognition devices do not automatically compensate for the changing environments and/or differing transducer response characteristics.
For example, each phone or web-enabled phone is likely to use a transducer having different response characteristics. The response characteristics associated with a head mounted transducer or microphone used in Internet telephony applications is likely to differ from a Jabra hands-free EarSet® microphone used by a hands-free mobile phone user. Conventional speech recognition systems assume each mobile device has the same response characteristics with the result that the accuracy of the speech recognition is reduced.
Similarly, for background noise, the user of an Internet telephony application will experience a quiet and predictable background noise environment while a user of a mobile phone will experience a constantly changing background noise environment. Conventional speech recognition systems assume each mobile device experiences the same background noise resulting in reduced accuracy of the speech recognition system.