Cellular telephones and motor vehicles have long offered speech recognition systems for hands free operation, for navigation, and for controlling entertainment systems. These systems have suffered from the difficulty of understanding multiple languages, dialects, vocabularies, and pronunciation styles. Poor diction and background noise make speech recognition even more difficult. Some devices operate well by only recognizing a few statements. In some cases, the list of possible statements is displayed on a screen or recited audibly to the user by the system. The user makes one of the statements and then the device repeats the statement for confirmation.
With the advent of data connectivity for smart phones and the Internet of Things, large and powerful servers coupled to substantial databases are available to connected devices. This allows for much better recognition for more languages and more words. Newer products allow users to not only speak to their smart phone, television, and gaming console, but also to watches, fitness sensors, glasses and other portable and wearable devices.
With the increased use and variety of handheld and wearable communications devices, speech understanding and audio quality have become increasingly important. Many handheld and wearable devices receive speech and other audio and send the captured audio to a remote server. The remote server converts the speech to text or commands and sends it back to the connected device. This allows the speech to be used for voice commands. Cloud speech recognition systems are designed to provide good accuracy independent of the acoustics, vocabulary, and grammar of the user. Systems implemented in the cloud can use very large acoustic and language models to meet this goal and can update models frequently.