1. Field of the Invention
The present invention relates in general to speech recognition software and, in particular, to a method and apparatus for directing the output of a pre-recorded audio file into a speech recognition program.
2. Background Art
Speech recognition programs are well known in the art. While these programs are ultimately useful in automatically converting speech into text, many users are dissuaded from using these programs because they require each user to spend a significant amount of time training the system. Usually this training begins by having each user read a series of pre-selected materials for approximately 20 minutes. Then, as the user continues to use the program, as words are improperly transcribed the user is expected to stop and train the program as to the intended word thus advancing the ultimate accuracy of the speech file. Unfortunately, most professionals (doctors, dentists, veterinarians, lawyers) and business executive are unwilling to spend the time developing the necessary speech files to truly benefit from the automated transcription.
In a previously filed, co-pending patent application, the assignee of the present application teaches a system and method for quickly improving the accuracy of a speech recognition program. That system is based on a speech recognition program that automatically converts a pre-recorded audio file into a written text. That system parses the written text into segments, each of which is corrected by the system and saved in an individually retrievable manner in association with the computer. In that system, the speech recognition program saves the standard speech files to improve accuracy in speech-to-text conversion. That system further includes facilities to repetitively establish an independent instance of the written text from the pre-recorded audio file using the speech recognition program. That independent instance can then be broken into segments. Each segment in the independent instance is replaced with an individually retrievable saved corrected segment, which is associated with that segment. In that manner, applicant""s prior application teaches a method and apparatus for repetitive instruction of a speech recognition program.
In another, previously filed, co-pending patent application, the assignee of the present application discloses a system for further automating transcription services in which a voice file is automatically converted into first and second written texts based on first and second set of speech recognition conversion variables, respectively. For instance, disclosed in this prior application is that the first and second sets of conversion variables have at least one difference, such as different speech recognition programs, different vocabularies, and the like.
As noted in this second co-pending patent application, certain speech recognition programs do not facilitate speech to text conversion of pre-recorded speech. One such program is the commercially successful ViaVoice(trademark) product sold by IBM Corporation of Armonk, N.Y. Yet, the receipt of pre-recorded speech is integral to the automation of transcription services. Consequently, it is an object of the present invention to directing the output of a pre-recorded audio file into a speech recognition program that does not normally provide for such functionality.
This and other objects will be apparent to those of ordinary skill in the art having the present drawings, specification and claims before them.
The present invention discloses, in part, a method for directing a pre-recorded audio file to a speech recognition program that does not normally accept such files, such as IBM Corporation""s ViaVoice(trademark) speech recognition software. The method includes: (a) launching the speech recognition program to accept speech as if the speech recognition program were receiving live audio from a microphone; (b) finding a mixer utility associated with the sound card; (c) opening the mixer utility, the mixer utility having settings that determine an input source and an output path; (d) changing the settings of the mixer utility to specify a line-in input source and a wave-out output path; (e) activating a microphone input of the speech recognition software; and (f) initiating a media player associated with the computer to play the pre-recorded audio file into the line-in input source.
In a preferred embodiment, the method may further include changing the mixer utility settings to mute audio output to speakers associated with the computer. Similarly, the method would preferably include saving the settings of the mixer utility before they are changed to reroute the audio stream and restoring the saved settings after the media player finishes playing the pre-recorded audio file.
The present invention also includes, in part, a system for directing a pre-recorded audio file to a speech recognition program that does not accept such files. The system includes a computer having a sound card with an associated mixer utility and an associated media player (capable of playing the pre-recorded audio file). The system further includes means for changing settings of the associated mixer utility, such that the mixer utility receives an audio stream from the media player and outputs a resulting audio stream to the speech recognition program as a microphone input stream.
In one preferred embodiment, the system further includes means for automatically opening the speech recognition program and activating the changing means. The system also preferably includes means for saving and restoring an original configuration of the mixer utility.