(Not Applicable)
(Not Applicable)
1. Technical Field
This invention relates to the field of speech recognition software and more particularly to off site voice enrollment on a transcription device for speech recognition.
2. Description of the Related Art
Speech recognition is the process by which an acoustic signal received by a transducive element, such as a microphone, is converted to a set of text words by a computer. These recognized words may then be used in a variety of computer software applications for purposes such as document preparation, data entry, and command and control. Improvements to speech dictation systems provide an important way to enhance user productivity.
Currently within the art, a user must train a speech recognition system with the user""s voice patterns in order to achieve accurate speech recognition. This process is called enrollment. The enrollment process involves user dictation of a body of text into the speech recognition system. This body of text, referred to as an enrollment script, is known to the speech recognition system and may be a short story or a collection of sentences containing particular phonemes. Using acoustic models representing phonemes and a language model containing word groupings and word frequency data, the speech recognition system decodes the user dictation of the enrollment script into text. The system can then analyze the user""s dictation of the enrollment script in relation to the known text of the enrollment script. After decoding the user dictation, the speech recognition system can be trained by generating a personal voice model for a particular user. The personal voice model consists of acoustic models that are tailored to a particular user""s manner of speaking. The user specific acoustic models can be compared to unknown speech to find a text match for speech analyzed by the speech recognition system. Through this training process, the speech recognition system can thereafter better respond to the particular user""s known voice patterns. After a speech recognition system is trained to a particular user""s voice patterns, such a system is said to be speaker dependent rather than speaker independent.
The enrollment process works not only to train the speech recognition system to recognize a particular user""s voice patterns, but also to train the system to compensate for audio properties of the system""s transducive element and a local audio environment in which the speech recognition system is operated. Because most speech recognition systems operate in an unchanging audio environment with the same transducive element, a user need go through the enrollment process only one time. For example, a speech recognition system implemented on a high speed multimedia computer will most likely be used only in the room in which the system is operated. Such systems are not portable and once set up, the audio environment remains constant. Consequently, when the user trains the system through the enrollment process, the system accounts for the audio properties of the microphone and the local audio environment in which the system is operated.
With the widespread use of portable transcription devices, particularly digital transcription devices, it is desirable for speech recognition systems to transcribe a user""s dictation from a recording made by such portable devices. The problem arises that the user""s transcription device and the unknown audio environment in which dictation is recorded effectively become part of the speech recognition system input path. However, the transcription device and the audio environment have not been characterized. For example, a transcription device has a built in microphone. Such a microphone has unique audio properties that differ from the transducive element which is part of the speech recognition system. Therefore, if a recording is received by a speech recognition system from a transcription device having a microphone with unknown audio properties, then the speech recognition system may not be able to accurately perform speech recognition on the recording.
Presently, some speech recognition systems allow a user to connect a transcription device to the system for use as a microphone in place of the system microphone. Then the user can be enrolled into the system using the transcription device microphone instead of the system microphone. However, this method does not work with all transcription devices, especially devices that cannot function independently as a microphone.
Another problem inherent to using a portable transcription device is that the properties of the distinct audio environment in which such a device is used differ from the properties of the local audio environment where the speech recognition system is operated. The very nature of a portable translation device ensures that it will be used in an audio environment distinct from the local audio environment of the speech recognition system. Consequently, the differing properties of the two audio environments may prevent accurate transcription of a recording. Thus, there has arisen a need for a method of off site or batch voice enrollment of a user using a transcription device.
The invention concerns a method and a system for enrollment of a user in a speech recognition system. The method of the invention, also referred to as off site or batch enrollment, involves a plurality of steps. The invention provides a user with an enrollment script. Then the system receives a recording made with a transcription device of a dictation session in which the user has dictated at least a portion of the enrollment script. Next, the invention enrolls the user in the speech recognition system by decoding the recording and training the speech recognition system.
According to one aspect of the invention, the dictation session occurs in an audio environment which is distinct from a local audio environment in which the speech recognition system is operated. Further, the recording can be made with a digital voice recorder or an analog voice recorder. Additionally, the method can include the step of comparing the recording to the enrollment script to determine if the recording contains a minimum predetermined percentage of words contained in the enrollment script. Also, the system can enable a user activatable icon for initiating the training process using the recording if the recording contains the minimum predetermined percentage of words.
According to a second aspect, the invention can be a computer speech recognition system for enrollment of a user. In that case, the system includes programming for providing a user with an enrollment script. The system also includes programming for receiving a recording made with a transcription device of a dictation session in which the user has dictated at least a portion of the enrollment script. Further, the system can include programming for enrolling the user in the speech recognition system by decoding the recording and training the speech recognition system.
The dictation session can occur in an audio environment which is distinct from a local audio environment in which the speech recognition system is operated. Additionally, the recording can be made with a digital voice recorder or an analog voice recorder.
Similar to the previously described method, the system can include additional programming for comparing the recording to the enrollment script to determine if the recording contains a minimum predetermined percentage of words contained in the enrollment script. Also, the system can include programming for enabling a user activatable icon for initiating the training process using the recording if the recording contains the minimum predetermined percentage of words.
According to a third aspect, the invention may comprise a machine readable storage, having stored thereon a computer program having a plurality of code sections executable by a machine for causing the machine to perform a series of steps. The machine readable storage can cause the machine to perform the step of providing a user with an enrollment script. Further, the machine readable storage can cause the machine to perform the steps of receiving a recording made with a transcription device of a dictation session in which the user has dictated at least a portion of the enrollment script, and enrolling the user in the speech recognition system by decoding the recording and training the speech recognition system.
The machine readable storage also can be programmed so that the dictation session occurs in an audio environment which is distinct from a local audio environment in which the speech recognition system is operated. Further, the recording can be made with a digital voice recorder or an analog voice recorder.
The machine readable storage can include programming for causing the machine to perform the further step of comparing the recording to the enrollment script to determine if the recording contains a minimum predetermined percentage of words contained in the enrollment script. Also, the machine readable storage can include programming for causing the machine to perform the step of enabling a user activatable icon for initiating the training process using the recording if the recording contains the minimum predetermined percentage of words.