1. Field of the Invention
This invention relates generally to the field of connecting audio input devices to audio systems in computers, as well as determining the proper audio settings for the audio input devices to achieve optimal results. In particular, this invention relates to selecting, connecting and optimizing all audio input devices, not just microphones.
2. Description of Related Art
Typically, audio input devices for computers have been only microphones. The use of alternative input devices in connection with personal computers has increased in popularity due to the advent of inexpensive multimedia computers. Alternative audio input devices can include, for example, personal transcription devices, that is, hand-held recorders used for dictation. In some cases, these devices are analog and use recording tape. More recent arrivals in the marketplace are digital recorders using flash memory cards. Example of such digital input devices are the Olympus(copyright) D1000 or the Sony(copyright) ICD-70. Such audio input devices, and others, can be used in addition to microphones to provide audio inputs to speech recognition applications in computers. Inasmuch as microphones are considered the fundamental audio input device, the alternative audio input devices need to be connected to other input plugs and ports. Such plugs and ports are generally designated as line-in or auxiliary plugs and ports. The term input device as used herein is intended to cover all sources of audio input signals, particularly those sources other than microphones, for example, line-in or auxiliary devices.
Audio input devices are generally connected to sound cards installed within a computer, for example a personal computer. The sound card receives and digitizes the analog signals generated by the input device. The digital signals are processed by the processor of the computer for performing functions such as storage of the audio file in memory or other audio related functions. The audio levels, as measured by the amplitude of the audio input waveform, at which the analog audio signals are recorded prior to being digitized are critical to any application that subsequently uses this data. It will be appreciated that each of the alternative audio input devices can have characteristically different output signals, requiring different kinds of jacks or ports and requiring different parameters with respect to setting up the sound card. These differences can be manifested from manufacturer to manufacturer, and from model to model of each manufacturer. Moreover, sound cards from different manufacturers, as well as different sound cards produced by the same manufacturer, will also have different characteristic responses to input signals. Despite the vast potential for variation of alternative input devices and sound cards, each speech application has optimum signal parameter requirements independent of the audio input sources which must be satisfied to maximize the efficiency of the speech recognition engine in the speech application. The audio levels at which the analog audio signals are recorded, prior to being digitized, are critical to any application that uses this data. These settings can adversely affect applications that require audio signals to function properly.
There are many ways in which audio input devices can be improperly connected and audio systems can be misconfigured. These include, for example, selection of the wrong sound card, selection of the wrong audio input device, loose plugs, selection of the wrong jack, improper setting of mute switches, battery problems in microphones and adapters, environments with high background noise, improper adjustment of audio parameters and the presence of disruptive audio feedback.
The present approach to this problem, to the extent there is any approach at all, is the use of a manual procedure. Manual procedures require considerable user intervention, which is by nature problematic at best. Accordingly, a significant need exists for a method and apparatus that facilitates the proper connection of the input device or devices and the configuration of the respective audio settings, no matter the variation of the input devices and sound cards and speech applications. The method and apparatus should be user-friendly, insofar as sophisticated knowledge of computer operation should not be required. The method or apparatus should address all of the problems which can be encountered, and in so doing, should display diagnostic information and clear instructions to the user for correcting problems.
Certain characteristics of the digitized signals can be used to enhance algorithms that process the signals. In other word, a cleaner, less noisy audio input signal enhances the performance of the speech recognition engine. One such class of algorithms that processes digitized signals are those that perform speech recognition. Some speech recognition systems allow the user to train the speech recognition engine with digitized samples of the user""s own speech. This training produces a set of training data that is associated with a particular user and/or enrollment. This generally increases the performance and accuracy of the system because the speech application is taking into account specific voice and speaking characteristics of the user gathered during training, the environment and the corresponding audio input device. The system can be trained, for example, for different input devices, such as microphones and line-in devices. The system can also be trained, for example, for a low or high background noise environment. Such training gives the user the flexibility to optimally tailor the system to the user""s particular needs. However, performance of the system using the data, for example the accuracy of a speech recognition, can suffer severely if the speech recognition engine is using a particular set of training data that does not correspond correctly to the current digitized signals coming in from the sound card. This can easily occur if the user accidentally mismatches the input device or environment with the selected training data.
Accordingly, a need exists for a method and apparatus that programmatically maintains the correct association between a user""s training data and the corresponding audio input device.
Audio feedback is a problem caused by having an open microphone in the acoustic field of the corresponding output speaker. The audio levels in which the analog audio signals are recorded at, prior to being digitized, are critical to any application that uses this data. Before the audio settings can be correctly set for a particular device, that device has to be correctly connected to the computer and correctly set up for use with the computer. However, an incorrectly configured audio mixer has the potential of causing audio feedback that is not only annoying, but in severe cases can cause hearing problems. This problem usually manifests itself as a loud high-pitched sound, often described as a squeal or whistle. The user will generally have no idea of the potential of audio feedback, and if this is the case, the user probably will not recognize the need to correct the problem proactively.
Accordingly, a need exists to identify the likelihood or potential for audio feedback, so that the user can take the steps necessary to prevent such audio feedback before the feedback occurs.
Speech recognition programs use standard microphone input to obtain data to convert to text. However, other kinds of audio input devices must now be accommodated, and users must be able to select from multiple audio input sources. These sources can even include recorded data files in multiple formats. The ability to convert standard windows audio files(.WAV) to text has been demonstrated. Even so, the user must still manually convert the recorded data from the input device in the .WAV format before using another software application to convert the audio data into text. There is a need for a method and apparatus for simplifying the selection of many input audio devices and the processing of their respective output signals in many file formats, not just the .WAV format.
Use of the wrong audio device, or source, in the first instance, is another potential problem occasioned by the availability of multiple audio input devices. Connecting the wrong audio input device can cause the set up procedure to fail. In such a case, there is a need for a method and apparatus to guide a user through the procedure for changing the audio input device when it appears that such a wrong connection is the cause of the set up failure.
The a need for a method and apparatus for simplifying the selection of many input audio devices and the processing of their respective output signals in many file formats is satisfied by the inventive arrangements taught herein.
A method for enabling user selectable input devices for dictation or transcription in a speech application, in accordance with the inventive arrangements, comprises the steps of: establishing a registry of dictation and transcription device descriptions, each of the descriptions including a device specific image, a device specific set of device-connecting instructions and a device specific list of audio configuration parameters; building dynamic tables containing information retrieved from the registry; establishing and storing a plurality of enrollments, each of the enrollments representing a speech file of user specific training data corresponding to at least one of a specific audio input device and a specific audio environment; and, generating a graphical user interface (GUI) display screen using the information in at least one of the dynamic tables to enable user selection any input device in the registry for which one of the enrollments is available, for use as a dictation or transcription input to the speech application.
The method can comprise the step of including in the GUI display screen a plurality of nested menus for identifying each the enrollment for each the device and enabling the user selection.
The method can further comprise the step of identifying sets of compatible devices in the registry and in the dynamic tables.
The method can further comprise the steps of: searching for an enrollment with a compatible device when a user selects a device without a corresponding enrollment; and, automatically using information from the compatible device to enable initiation of a dictation or transcription session without a training session.
The method can further comprise the steps of: identifying sets of compatible devices in the registry and in the dynamic tables; and, displaying each the set of the compatible devices in the GUI display screen menus as a single selectable item.
The method can comprise the step of storing text from recognized speech in a single document irrespective of the recognized speech being dictated directly to the speech application or played back from a transcription device.
A computer apparatus programmed with a routine set of instructions stored in a fixed medium, for enabling user selectable input devices for dictation or transcription in a speech application, in accordance with the inventive arrangements, comprises: means for establishing a registry of dictation and transcription device descriptions, each of the descriptions including a device specific image, a device specific set of device-connecting instructions and a device specific list of audio configuration parameters; means for building dynamic tables containing information retrieved from the registry; means for establishing and storing a plurality of enrollments, each of the enrollments representing a speech file of user specific training data corresponding to at least one of a specific audio input device and a specific audio environment; and, means for generating a graphical user interface (GUI) display screen using the information in at least one of the dynamic tables to enable user selection any input device in the registry for which one of the enrollments is available, for use as a dictation or transcription input to the speech application.
The GUI display screen includes a plurality of nested menus for identifying each the enrollment for each the device and enabling the user selection.
The apparatus can further comprise means for identifying sets of compatible devices in the registry and in the dynamic tables.
The apparatus can further comprise: means for searching for an enrollment with a compatible device when a user selects a device without a corresponding enrollment; and, means for automatically using information from the compatible device to enable initiation of a dictation or transcription session without a training session.
The apparatus can further comprise: means for identifying sets of compatible devices in the registry and in the dynamic tables; and, means for displaying each the set of the compatible devices in the GUI display screen menus as a single selectable item.
The apparatus can further comprise means for storing text from recognized speech in a single document irrespective of the recognized speech being dictated directly to the speech application or played back from a transcription device.