1. Field of the Invention
This invention relates to a system and method for the training of Automatic Speech Recognition (ASR) systems, and, in particular, to a system and method which maintains a desired level of accuracy in the user""s ASR system during the period of training.
2. Description of the Related Art
Presently, many of the most common ASR systems are comprised of software that runs on an IBM-clone PC (e.g., IBM""s ViaVoice(trademark), Dragon""s Naturally Speaking(trademark), etc.), although an ASR system can be comprised of any admixture of hardware and software that can recognize spoken words. Typically, ASR systems compare samples of one or more spoken words to samples stored within memory, where the samples are acoustic recordings of pieces of speech.
One of the problems with ASR is that it needs to be trained for each user in order to become efficient. In other words, ASR works best, i.e., has the greatest percentage of correct words and sentences, when it is allowed to store a large range of acoustic samples for each speaker. This process of recording samples with associated phonemes that make up words is called xe2x80x9ctraining.xe2x80x9d Examples of training are contained in U.S. Pat. No. 5,963,903 to Hon et al., U.S. Pat. No. 6,076,056 to Huang et al., and U.S. Pat. No. 6,125,341 to Raud et al., all of which are hereby incorporated by reference.
Training is inconvenient for the new user, who needs to sit with the ASR system for a period of time in order that the system xe2x80x9clearnxe2x80x9d the user""s voice. This forces a dilemma on the designers of ASR systems: if you have a period of training long enough to guarantee good results, the user may find that the time spent is a considerable nuisance. On the other hand, if you have a quick and easy period of training, it may be insufficient, and the user may find an unacceptable level of errors in interpretation.
Other problems related to training involve its two essential parts: the adaptation of the acoustic model, and the adaptation of the language model. The acoustic model relates to the sound samples, and learning the pronunciation range of a speaker. The language model relates to the vocabulary and grammar used by the speaker, and learning the more common words and phrases of the speaker. Both of these adaptations require time in order to accumulate the necessary amount of data. In addition, there may be environmental variances during training. For example, the speaker may have a cold during a segment of the training period, thus affecting the acoustic model; or the speaker may be a writer who was writing (by speaking aloud) an essay on medicine during a segment of the training period, thus affecting the language model. Speaker-independent ASR systems, by definition, do not require training in one speaker""s voice. However, speaker-independent ASR systems have an unacceptably high level of errors in their transcriptions.
Therefore, there is a need for a speaker-dependent ASR system that does not burden the user with an extensive training period, yet retains a high level of accuracy in its transcriptions.
One aspect of this invention is to provide a speaker-dependent ASR system and method that does not burden the user with an extensive training period.
Another aspect of the invention is to provide a speaker-dependent ASR system and method that retains a high level of accuracy, while not requiring an extensive period of training.
Yet another aspect of the invention is to provide a speaker-dependent ASR system and method that allows the user to set an arbitrary level of accuracy.
A further aspect of the invention is to provide a system and method by which a consumer, who already owns an ASR system, pays for the training of the ASR system.
Yet a further aspect of the invention is to provide a system and method by which a consumer, who already owns an ASR system, pays for an arbitrary level of accuracy in resulting transcriptions.
To fulfill the above and other aspects, a system and method is provided for training a speaker-dependent Automatic Speech Recognition (ASR) system to a desired level of accuracy. In one aspect of the system and method, a user requests an ASR Training Center to train his or her ASR system within certain service parameters. During training, a certain level of accuracy is maintained, even during the very first session, by having a stenographer transcribe the audio material. When the user uses his or her ASR system, the stenographic transcription, rather than the ASR system transcription, is output to the user until the ASR Training Center determines that the user""s ASR system has achieved the desired level of accuracy. The stenographic transcription is also used by the ASR Training Center to train the user""s ASR system in a manner appropriate to that system.