This invention relates generally to speech recognition systems, and more particularly to automated centralized updating of such systems.
Speech recognition has become an increasingly popular application for computers and computerized devices. It affords users an alternative manner by which to accomplish input, in lieu of or in addition to standard manners of input such as keyboard entry, and pointing device input. Thus, users who cannot type, or prefer not to type, are still able to interact with their computers. Speech recognition can be used for sending commands to the computer, such as pointer movement and pointer clicking, as well as for applications such as text dictation into a word processing program, etc.
A common problem with speech recognition is known as the out-of-vocabulary (OOV) problem. If a word is not in the lexicon, or dictionary, of the speech recognition system that a user is using, the system is unable to recognize the word correctly when spoken by the user. The OOV problem can occur both when the user uses words that are very uncommon and therefore not in the dictionary, or when the user uses words that have been newly introduced into common usage. As an example of the former, a zoologist may use technical words that are uncommon to the population as a whole, and therefore not found in the dictionary. As an example of the latter, a speech recognition system developed prior to the widespread acceptance of the Internet may not have the word xe2x80x9cInternetxe2x80x9d in its vocabulary.
For this and other reasons, there is a need for the present invention.
The invention relates to automated centralized updating of speech recognition systems. In one embodiment, a speech recognition program at a client, such as a computer or a computerized device like a personal-digital-assistant (PDA) device or a wireless phone, receives data that is unrecognized. The data can in varying embodiments represent one or more of an unrecognized word, an unrecognized pronunciation of a known word, an unrecognized dialect of a known word, and a substantially new word frequency usage. The client transmits the data to a provider. The provider processes the data into known data, and transmits the known data back to a number of clients, possibly including the client that initially sent the unrecognized data. For privacy and/or other concerns, the unrecognized data may be sent from the client to the provider via a trusted third party, to anonymize the data.
Embodiments of the invention provide for advantages not found within the prior art. Rather than have users individually endure training of their speech recognition systems with new words, dialects, word frequency usages, etc., embodiments of the invention leverage the users"" collective encountering of new words, dialects, word frequency usages, etc. For example, if the word xe2x80x9cInternetxe2x80x9d is not known to the speech recognition program of a number of users, generally the first user encountering this word will cause his or her speech recognition program to send the unrecognized data representing the word to the provider. The provider can then process the unrecognized data into known data representing the word, and have the known data transmitted back to all users, eliminating the need for every user to individually train his or her speech recognition program with the new word. Thus, the vocabularies of the speech recognition programs of users collectively grow as any user encounters new words. Furthermore, words, dialects, etc., that are particular to a specific group or region of people, such as a group of zoologists, or the region of people living in Mississippi, etc., can be collected and transmitted only among that specific group or region of people.
It is noted that the invention can be implemented in different manners as to clients and servers. For example, in some embodiments, the speech recognition program and the vocabulary therefor is maintained at the client level, such that the server only exists to render improvements to the vocabulary for transmission back to the clients, which then incorporate the improvements back into their speech recognition program vocabularies. This is most apt for applications where users have one primary client on which they use speech recognition, such as a desktop computer. While the invention described in the detailed description is largely specific to this embodiment, the invention itself is not so limited, however.
For example, in other embodiments, the speech recognition program runs on clients, but the vocabulary is stored and maintained at the server level. The clients therefore still perform the speech recognition process, but this process utilizes data from the vocabulary as stored on the server, such that the clients access the server as necessary. This is most apt for applications where users have many clients on which they use speech recognition, where all the clients have sufficient processing power to perform speech recognition.
In still other embodiments, the speech recognition program runs at the server level, and the vocabulary is stored and maintained at the server level. Thus, the clients act solely to detect speech, and pass the speech on as detected to the server, which itself conducts the speech recognition, and passes back any recognized speech back to the clients. This is most apt for applications where the clients do not have sufficient processing power to perform speech recognition, such as wireless phones, and some personal digital assistant (PDA""s). These embodiments are example embodiments of the invention, furthermorexe2x80x94the invention itself is not limited to splitting the speech recognition and vocabulary improvement process among the clients and the server in any of these recited manners.
The invention includes computer-implemented methods, machine-readable media, computerized systems, and computers of varying scopes. Other aspects, embodiments and advantages of the invention, beyond those described here, will become apparent by reading the detailed description and with reference to the drawings.