1. Field of the Invention
The present invention relates to training a speech recognition system and, more particularly, to computer-implemented speech recognition system training to associate a user selected vocalization to a concept represented by an icon.
2. Description of Related Art
In the field of speech recognition, many methods have been implemented to correlate an utterance or vocalization of a user to a reference vocalization pattern. The reference vocalization patterns are typically created during a xe2x80x9ctraining sessionxe2x80x9d prior to use of the speech recognition system in the intended environment. In the training session, existing speech recognition training systems prompt a user to utter into a microphone a vocalization corresponding to a specific word displayed on a screen. The vocalization is converted by an analog to digital converter and appropriate electronics, such as filters and amplifiers, into a signal which is processed by the software into a representative waveform or vector, as understood by those skilled in the art. For example, the vocalization could be transformed into multidimensional vectors by utilizing Fourier transforms to develop a series of frames representing digital values of the spectral features of the vocalization over specific units of time.
Speech recognition systems are employed, for example, in manufacturing, repair, avionics, and medical applications where it is important for a user to have his or her hands free to perform a manual task while simultaneously performing a second manual task which may be carried out by a computer controlled machine or device. Using a microphone, the user performing a first task can control one or more designated systems without diverting time and/or attention to separately performing each of the additional tasks. This type of system is used in automotive applications to allow a user to control devices such as a lift or jack connected to a computer. This type of system is also used in a vehicle wheel alignment process, providing feedback and sensor data to guide a user to make adjustments necessary to bring the vehicle into conformance with specified alignment values.
Conventional software applications, including speech recognition applications, increasingly employ icons as a graphic shorthand for a concept or predefined set of program instructions. Thus, a user would know that when an icon is clicked or selected, a predefined event or sequence of events will occur. Frequently, a tag or text box is disposed immediately adjacent an icon to provide an additional clue to the meaning of the concept represented by the icon. The tag or text box is particularly relevant in speech recognition applications, wherein the software and computer must be trained to relate a spoken command of the user to the desired icon. Conventionally, the user repeats the specific word or words displayed adjacent the icon, whether actually required by the software or required simply to avoid confusion. For example, for an icon having the appearance of a floppy disk disposed adjacent a tag reading xe2x80x9csavexe2x80x9d, the user will train the software to recognize the users utterance of xe2x80x9csavexe2x80x9d. Thus, the user is restrained from using other vocalizations or words that might carry greater significance or meaning to that user, a problem which becomes more relevant as the nature of the concept represented by an icon becomes more abstract and less easily defined.
Further, adaptation of these speech recognition systems to foreign languages can be troublesome, requiring modification to both the software display as well as the relational language database adding additional cost and complexity to the system. In the event a separate language database (e.g., Japanese) is not available, persons not fully proficient in the base language (e.g., English) may experience difficulty comprehending and/or pronouncing the reference words and suffer a loss of productivity. The reference words may also pose a mnemonic challenge to non-native speakers further compromising productivity, particularly as the associated phrases and tasks become more complex.
In extreme cases, individuals may not be proficient in reading their native language or may suffer from speech impediments, presenting additional obstacles to training and implementing a speech recognition control system based solely on a correspondence between a specified word and corresponding actions. Even barring such difficulties, it is not always easy to relate a desired effect or outcome to an externally imposed definition of the desired effect or outcome. In other words, a software designer""s definition or shorthand concept of an action or sequence of actions may not precisely correspond to a user""s internal definition of the same action or sequence of actions, based on the user""s own experience base. Thus, the potential exists for mnemonic inconsistencies manifested in an inability of the user to recall specific software imposed relationships, requiring undesirable diversion of the user""s attention from a task at hand.
Thus, a need exists for training a speech recognition system that is substantially language insensitive and conformable to individual users.
The invention provides, in various aspects and embodiments, computer-implemented training of a speech recognition system to associate a user selected vocalization to a concept represented by an icon to fill the needs identified above.
In one aspect, a method for training a computer implemented speech recognition system includes displaying an icon representing a concept and prompting a user to generate a vocalization comprising any sound determined by the user to associate to the icon. The method also includes confirming association of the vocalization with the icon and saving the association of the vocalization with the icon to a computer readable medium.
In another aspect, a computer-readable medium bears instructions enabling a computer to associate a sound made by a user to a concept associated with an icon selected by the user, where the sound may include any sound or combination of sounds the user wants to relate to the icon. The instructions relate the sound made by the user to a concept or instruction set associated with a selected icon. Then, the sound and the relationship between the sound and the icon""s concept are stored. Further instructions may compare a subsequent user""s sound to the stored sound to determine if the subsequent sound corresponds to the stored sound. If a correspondence exists, the relationship between the stored sound and the icon""s concept is used to execute an instruction set corresponding to the identified icon.
In still another aspect, a computer-based vehicle diagnostics system includes a speech recognition program product configured to process, in combination with a computer processor, a signal provided to the processor by a sound-to-signal transducer, such as a microphone, and relate a concept represented by an icon displayed on a display to any sound determined by the user.
These and other aspects and advantages of the invention will be apparent to those skilled in the art from the following detailed description and accompanying drawings.