1. Technical Field
This invention relates to the field of audio devices, and more particularly, to characterizing audio devices for controlling audio signal levels.
2. Description of the Related Art
Speech recognition is the process by which an acoustic signal received by a transducive element, such as a microphone, is converted to a set of text words, numbers, or symbols by a computer. These recognized words may then be used in a variety of computer software applications for purposes such as document preparation, data entry, and command and control. Improvements to speech recognition systems provide an important way to enhance user productivity.
For accurate conversion of a user spoken utterance to recognized text, the audio signal representing the user spoken utterance should have an adequate signal level. Oftentimes, the speech recognition system can misrecognize user spoken utterances if the audio signal level is too low or too high. One important factor which substantially can affect the level of an audio signal can be the distance between the speaker and the microphone. Typically, during a speech dictation session, the distance between a user and the microphone can be a constantly changing parameter. More particularly, though the microphone initially can be positioned such that the speaker is located in close proximity to the microphone, as the speaker dictates, the speaker can unknowingly shift body positioning, or otherwise can maneuver such that the distance between the speaker and the microphone changes. Accordingly, the level of the audio signal received by the speech recognition system also changes. For example, as the speaker draws closer to the microphone, the audio signal level can increase. Conversely, as the speaker pulls away from the microphone, the distance between the speaker and the microphone increases, which can result in a decreased audio signal level.
To ensure that an optimal audio signal level is received by a speech recognition system, automatic gain controls (AGCs) have been implemented in conventional audio device control software which can cooperate with a speech recognition system in order to monitor incoming audio signal levels. Based upon whether a received audio signal is too weak or too strong, conventional AGCs can dynamically adjust, i.e., raise or lower, the input signal level accordingly. Thus, by incorporating a software-based AGC within a speech dictation system, conventional speech dictation systems can dynamically adjust actual audio signal levels during a speech dictation session, thereby increasing speech recognition accuracy.
Presently, however, conventional speech recognition systems incorporating AGC software are deployed across a variety of computing platforms, which further can contain a variety of audio devices. Generally, audio device characteristics vary from audio device to audio device. For example, though an audio input device, such as a sound card or an audio preamplifier, can have an adjustable input range of input level settings ranging from zero to one-hundred, such audio devices can actually have a more limited resolution of possible input signal levels.
For example, adjusting the input signal level from a setting of zero to a setting of ten may not change the actual audio signal level of an incoming audio signal. However, a change in the input signal level from ten to eleven can result in an increase in the actual audio signal level. Similarly, the actual audio signal level can remain constant for the input level adjustments ranging from eleven to twenty, followed by a sudden step-like increase in the actual audio signal level when the input control level is adjusted from twenty to twenty-one. In this manner, an audio device can have ten ranges where the actual audio signal level transitions can correspond to increments of ten on the input level control. In consequence, actual audio signal levels can be mapped to particular input level settings for a particular audio device.
Notably, an audio device in a different computing platform can have audio signal transitions which map to different input signal levels. For example, a different audio device can have a map which indicates transitions responsive only to five actual adjustments to the input signal level while the actual audio signal level remains constant for the input level range of zero to twenty, twenty-one to forty, and forty-one to sixty. Notably, audio device driver specifications typically omit the actual ranges where an actual audio signal level changes responsive to changes in the input level. Similarly, audio device driver specifications typically do not disclose the magnitude by which an actual audio signal level can change in response to a change from one input level setting to another. Thus, input signal level adjustments performed by an AGC in one computing platform can result in unexpected actual audio signal level changes when applied to a different computing platform.
In view of the cross-platform requirements of modern speech recognition systems, to accommodate multiple, differing computing platforms having different audio devices, an AGC must dynamically adjust input signal levels in small increments until the desired actual audio signal level is reached. More particularly, without knowing the audio characteristics of a particular audio device, the AGC cannot determine when the actual audio level will change in response to an adjustment to the input signal level. Furthermore, when the input signal level changes, the ACG cannot determine the magnitude of a corresponding change in the actual audio signal level until after the change occurs. Thus, if an AGC changes the input signal level too quickly, the ACG may overshoot the desired actual audio signal level. The result can be inaccurate speech recognition.
There can be other disadvantages to the fine adjustment of the input level of an audio signal. Specifically, while an AGC fine adjusts the input signal level, the AGC consumes computer system resources which become unavailable to other application programs, including the speech recognition system. In addition, it can be disadvantageous to incrementally changing the input signal level because during the time period consumed by the incremental adjustments, the actual audio signal level remains improperly adjusted which can cause an increased risk of misrecognition by the speech recognition system. Significantly, present AGC implementations can require between ten and twenty seconds to properly adjust the actual audio signal level.