Voice recognition is the process by which spoken words are interpreted and "understood" by a computer. Voice recognition systems thus become another means for entering data and controlling a computer, to the function of a keyboard or a pointing device (e.g., mouse).
In a typical voice recognition system, a user speaks into an input device such as a microphone, which converts the audible sound waves of voice into an analog electrical signal. This analog electrical signal has a characteristic waveform defined by several factors including the volume at which the words are spoken. The volume component of the spoken word translates into the amplitude of the waveform.
Voice recognition involves pattern matching to compare the electrical signal associated with a spoken word against a reference signal associated with a "known" word. A "known" word is stored in a computer by a user. In a typical system, the user speaks a word into a microphone and the electrical signal of this spoken word is associated with a typed word. Instead of a typed word, the word can also be called up from a database, for example. After a word is "known," voice recognition can take place.
Thus, if the electrical signal of a spoken word matches the waveform of the reference signal of the "known" word, within an acceptable range of error, the system "recognizes" the spoken word as the "known" word (which has previously been associated with the reference signal). A software application which uses voice recognition could then use the voice input for entering data or controlling a software application (similar to the way a keyboard would be used). For example, in a word processor or dictation system using voice recognition text could be audibly entered into the body of a document via a microphone instead of typing the words into the text on a keyboard.
Digital signal processing can be used to provide an accurate comparison between the waveform of the voice audio input and that of the reference signal. Digital signal processing requires that the waveform of the voice audio input, as well as the waveform of the reference signal are represented as digital signals. Having a sufficient amplitude level for the voice audio input provides a better signal for conversion to a digital signal and thus a better reference signal for voice recognition. If the amplitude level is too low, there may not be enough range in the electrical signal of either the reference signal or the spoken word to provide a high enough level of confidence that the electrical signal of a spoken word matches that of the "known" word. If the amplitude level is too high, certain attributes of the electrical signals may be "clipped." This, too, may lower the confidence level of the pattern matching. In more extreme cases, the electrical signals may be too low or too high, resulting in no match. The sufficiency of the amplitude level is determined for a particular voice recognition "engine". The voice recognition engine is software or hardware which carries out the interpretation and analysis of the voice audio input (or its digital representative) to determine whether a match has occurred and the confidence level of the match. The Dragon Recognizer by Dragon Systems, Inc. of Newton, Mass. is an example of a voice recognition engine, which can be run on a personal computer.
Because different users speak at different sound levels, as well as the difference in background sound levels, both of which can effect the reception of a users speech by a voice recognition system, it is likely that in many situations, a voice recognition engine or system may not function at an optimal level. The audio input may not be within the acceptable range for the voice recognition engine being used.
Computers equipped for voice recognition may typically have a sound card in addition to an input device such as a microphone. A sound card typically includes a coder/decoder or CODEC. The Microsoft Sound System Sound Card uses the Analog Devices AD1848 Parallel-Port SoundPort Stereo CODEC. Among other functions, the CODEC contains an input volume control which can be used to adjust the amplitude level of an analog input signal from the microphone. The CODEC also converts an analog signal (representative of a voice input) into a digital signal. The digital signal can then be transmitted from the sound card through the computer bus for processing (such as pattern matching) by the computer.
One widely sold operating system program which helps control a computer is WINDOWS.TM. version 3.1 ("WINDOWS") of Microsoft Corporation. Among other features, WINDOWS provides a graphical user interface allowing the user the option of using a pointing device such as a mouse, to control the operation of the computer without the need to memorize text commands usually required in DOS based applications. WINDOWS also provides application programmers with tools so that applications have a common look in structure as well as execution of common operations. A WINDOWS application programmer is thus provided with a variety of tools to assist in controlling various computer functions as well as designing "user friendly" applications.
A software program written for WINDOWS operation uses dynamic link libraries (DLLs) which contain a plurality of application programming interfaces (APIs). Examples of such DLLs are USER.EXE, KRNL386.EXE, and GDI.EXE which contain the core functionality APIs that make up Microsoft Windows 3.1. Although each of these three DLLs has the .EXE extension (usually representing an executable application), each is a DLL. The APIs are used to carry out various WINDOWS functions. For example, if a software program requires a dialog box displayed on a computer monitor to prompt a user for a command or data entry, the software program would make a call to the DialogBox API which brings up a dialog box on the computer monitor. The contents of the dialog box are local to or associated with the particular application which made the call. Another example of a WINDOWS API is the SetWindowLong API. This API associates data with a particular window, allowing a user who has switched applications to return to the point in the original application where processing had been taking place prior to the switch to the other application. WINDOWS operation and WINDOWS programming, including the use of DLLs and APIs are well known by those skilled in the art. The Microsoft WINDOWS Software Development Kit, Guide to Programming, Volumes 1-3, 1992, is incorporated by reference herein. It is available and used by WINDOWS programmers and provides reference information for many of the DLLs and APIs which are available to WINDOWS programmers.
WINDOWS, while providing ease of use for running applications, may serve as a platform for a voice recognition system. WINDOWS lacks, however, a system for adjusting input levels to optimize voice recognition for a particular user at given location.
A speech detection recognition apparatus for use with background noise of varying levels is described in U.S. Pat. No. 4,829,578, to Roberts. The apparatus compares the amplitude of an audio signal during successive time periods with certain speech detection thresholds and generates an indication of whether the signal contains speech. The amplitude of the audio signal is altered relative to speech detection thresholds as a function of background noise signals which are detected to improve speech detection.
Roberts and other systems which relate to speech detection, do not address adjusting the input amplitude level to assist in and improve voice recognition. Still further, there is a lack of a system to make systematic adjustments to input amplitude levels by sampling a users speech and analyzing it in a controlled fashion and then adjusting an input device based on that sampling and analysis.