1. Field of the Invention
The present invention relates generally to control systems. More specifically, the present invention relates to a system, program product, and related methods to provide speech-activated control of vehicle components.
2. Description of the Related Art
Systems capable of responding to or obeying human commands have been begun to appear over the last decade or so. Such systems have attempted to increase the speed and ease with which humans can communicate with machines. Due to recent developments in computer hardware and software technology as well as recent advances in the development of algorithms for the recognition of speech, speech recognition systems have become more powerful, and therefore, more useful for interfacing a user with complex equipment having multiple functions to be able to control equipment by spoken commands. Speech recognition systems have also been used in control systems for controlling remotely piloted vehicles. In such systems, spoken commands are converted into a machine compatible control signal used to open or close solid-state switches. The control signal is transmitted to the aircraft to manipulate a switch which activates a servo that drives a selected control surface or manipulates a throttle setting.
Speech recognition systems generally operate by matching an acoustic signature of a word to be recognized against an acoustic signature of words previously stored in a vocabulary database. A microphone first converts the acoustic signature of the uttered word into an electrical signal. An A/D converter converts the electrical signal into a digital representation of the successive amplitudes of the audio signal created by the utterance. The signal is converted from the time domain to the frequency domain which gives the amplitude of the signal in each of a plurality of frequencies over time. Such acoustic signature can be visualized through display on a spectrogram, a three-dimensional graph which plots frequency along the vertical axis, time along the horizontal axis, and the intensity of the sound at any given frequency and time by degree of coloration. Generally, as part of the speech recognition process, the unknown word is broken down into its spectral components and the amplitude or intensity of the acoustic signature at various frequencies and temporal locations is then compared to that of the acoustic model of each word previously stored in a vocabulary database.
The speech recognition systems use various types of algorithms or speech engines to perform the speech recognition process. Pattern matching algorithms can include, for example, an asymmetric dynamic time warping algorithm and a Hidden Semi-Markov Model algorithm (HSMM), which can use dynamic time warping templates and Markov models, respectively, for each word stored in an associated vocabulary as a result of speech recognition pre-training. A Neural Net algorithm, e.g., single or multi-layer perception model algorithm, can also be used. Neural Net algorithms are typically arranged to learn features of each word, which discriminate the word from the other words in the vocabulary which is typically previously established by multiple training repetitions of the same word. That is, programming of the speech recognition system is achieved during a training or learning phase by uttering a list of words or phrases to be parameterized or otherwise broken down into spectral components and stored as spectral-temporal word models or templates in a vocabulary database. Such speech recognition systems can use pattern recognition, performing a parameterization followed by calculating a distance between spectral parameters resulting from the parameterization and the parameters associated with the words stored in the vocabulary database.
The performance of speech recognition systems tends to deteriorate significantly as the size of the vocabulary database to be searched to perform the speech recognition increases. As the size of the vocabulary database grows, there is an increased probability that a word from the vocabulary will be misrecognized as another similar sounding word. In some speech recognition systems, in order to limit the size of the vocabulary database to be searched, the speech engine can limit its search to a subset of the words stored in the vocabulary database. Such systems can include provisions for the user to provide a spoken transitional command to select a working syntax on the basis of the type of and alterations in the operational profile of the vehicle. Such systems, however, can be problematic in that misinterpretation of the spoken transitional command by the speech recognition engine or delivery of an incorrect transitional command by a user can result in attempting to recognize an utterance using an incorrect vocabulary database subset.
Determining exactly when an utterance has begun can also be problematic, especially when the acoustic signal includes high background noise content. One such system which can determine the temporal location of the beginning of the word to be recognized compares parameters of the acoustic signal to an acoustic model of the background noise to locate the beginning of the word.
Prior speech recognition systems, nevertheless, have generally not met user expectations. Such speech recognition systems require the speech engine or engines to be pre-trained for specific vocabulary and syntax sequences which are embedded in the speech engine for correlation with a pre-determined aircraft control function. Conversely, traditional uses of speech recognition with operational aircraft or at control station interfaces require a pre-determination of selected functions and their associated speech command vocabulary and/or syntax. That is, when implemented to be used with aircraft or control station interfaces for unmanned aerial vehicles, predetermined functions must be selected and associated with specific speech command vocabulary words and/or syntax prior to installation or operational use. Thus, such systems are not adaptable in real-time to an ever-changing operational environment.
When used with aircraft in-operation or control station interfaces, it is desirable, for example, to provide the user feedback as to whether or not the attempted recognition is correct and to provide the user the ability to readily correct the command if the attempted recognition is incorrect. Some systems provide the user a visual display or audio “repeat back” the system's understanding of the word or words which have been spoken. Such system can also require the user to confirm that the commands recognized are correct, either by saying an acceptance word, such as, for example, the word “yes,” or by pressing a keyboard key or other switch. Such systems place a considerable burden on the user by requiring him or her to confirm the system recognition, whether or not correct.
Recognized by the Applicants is the need for a speech actuated control system that, within a pre-approved domain of cockpit or control station command functionality, can enable the user, i.e., pilot or control station operator, in real time during flight, to select and record one or more command functions or system states of choice; select, record and command-associate an annunciation of choice; train the speech engine to recognize the selected annunciation; and execute the selected function via a speech command using its associated annunciation. Also recognized is the need for a speech engine to aircraft or control station interface which can enable the functionality chosen for a speech command to be associated therewith to be totally transparent to the speech engine, and that includes the capability to enable user selection of speech command functionality in real time. Also, recognized by the Applicants is the need for a speech actuated control system that can provide language independence, i.e., is not tied to any specific language.