1. Field of the Invention
This invention relates to the field of speech recognition applications, and in particular, to a method and apparatus for controllably varying audio playback speed in a speech recognition proofreader.
2. Description of Related Art
The detection of errors in a document dictated via speech recognition software is facilitated by a proofreading program that plays the originally dictated audio while simultaneously displaying and/or highlighting the text interpreted by the speech system. Proofreading programs operating in a speech recognition system can play dictated audio synchronized with the display and/or highlighting of the recognized text. Playback facilitates the detection of misrecognized words. As each recognized utterance is played, its corresponding text is also xe2x80x9cplayedxe2x80x9d, that is, displayed. Such a mechanism helps the user detect incongruities more easily than by visual inspection alone. In addition, the proofreader provides a xe2x80x9cmarkingxe2x80x9d capability, allowing the user to mark such errors for later correction. The proofreader stores the marks and allows the user to review them and correct the corresponding text at a later time. However, some speakers dictate so rapidly that during playback the errors are not easily seen, or even if seen, the playback is too rapid for the user the user to accurately mark the error, since the next word may already be playing by the time the user has acted. However, by automatically pausing between each dictated utterance the pace of the playback can be controlled and the user can be afforded the time required to accurately mark the errors.
A typical speech recognition system provides the ability to play the dictated audio for any recognized spoken word. In accordance with this capability, a typical speech recognition system will embody the following features. A first feature is to provide a client with a number (xe2x80x9ctagxe2x80x9d) that uniquely identifies an individual spoken word or phrase as defined by the speech recognition system. A second feature is that the speech recognition system can be loaded with a memory address pointing to an array of tags and can be directed to play a specific number or range of those tags. A third feature is that the speech recognition system notifies the caller whenever the system has begun playing an individual tag and provides the tag associated with the current spoken word or phrase. The notification occurs asynchronously through the use of a callback function specified by the proofreader and executed by the speech engine. A fourth feature is that the speech recognition system notifies the caller when all the tags have been played. The notification occurs asynchronously through the use of a callback function specified by the proofreader and executed by the speech engine. Such notifications will be generically referred to as xe2x80x9cAudioDonexe2x80x9d notifications.
There is a long-felt need for methods and apparatus to slow, and even variably control, the pace of playback to overcome this difficulty. There is a further long-felt need to control the pace of playback during proofreading by utilizing the features and capabilities of typical speech recognition systems, as described above.
In accordance with the inventive arrangements, the capabilities and features of speech recognition systems can be advantageously used in a novel and nonobvious manner to provide the fastest possible playback, to slow the playback and to adjust the speed of playback while playback is in progress.
A single call mode is provided for the fastest possible playback, in accordance with which the speech system is loaded with an array of tags and is then directed to play the entire array as one unit.
A multiple call mode is provided for playing each tag individually at slower and variable speeds, one at a time. A range of tags is played by making multiple calls to the speech system to load and play each tag individually, inserting a delay between each call. The delay can be variable.
A method for inserting a delay between the playback of individual words or phrases as recognized by a speech recognition system, in accordance with the inventive arrangements, comprises the steps of: (A) waiting for a playback command; (B) measuring a delay upon occurrence of the playback command; (C) initiating playback of only one of the individual words or phrases upon expiration of the delay; (D) waiting for a subsequent playback command; and, (E) upon occurrence of the subsequent playback command, repeating the steps (B), (C) and (D) for playing subsequent ones of the individual words or phrases, one at a time.
The method can further comprise the steps of: (F) generating a user interface for detecting the playback command and playing back the individual words and phrases; and, (G) executing the steps (A), (B), (C), (D) and (E) in an independent thread of execution.
The method can also further comprise the steps of: (F) tracking the playback of the individual words and phrases according to an ordered index; (G) issuing a notification each time a playback of one of the individual words or phrases is completed; (H) automatically repeating the steps (B), (C) and (D) for playing subsequent ones of the individual words or phrases responsive to each notification; and, (I) continuing the playing back until all unplayed ones of the individual word or phrases in the ordered index are played back.
In the basic method, and in each of the alternatives, the method can further comprise the step of varying the delay responsive to a user requested delay.
When user requested delays are made, the method can further comprise the steps of: comparing the user requested delay to a predetermined delay; repeating the step (E) if the user requested delay is greater than the predetermined delay; and, terminating the step (E) if the user requested delay is not greater than the predetermined delay. The method can further comprising the step of initiating playback of the individual or words or phrases as a continuous stream responsive to the terminating step.
When user requested delays are made, the method can also further comprise the steps of: comparing the user requested delay to a predetermined delay; changing from playing back the individual words or phrases one at a time to playing back the individual words or phrases as a continuous stream whenever the user requested delay is not greater than the predetermined delay; and, changing from playing back the individual words or phrases as a continuous stream to playing back the individual words or phrases one at a time whenever the user requested delay is greater than the predetermined delay.